Silent Paralinguistics

无声副语言学

基本信息

项目摘要

Speech is a natural human ability and a core part of what makes us a social species. Concealing the speaker’s lips behind a face mask lowers listener performance and confidence, while increasing perceptual effort. Hearing impaired and non-native-speakers face even greater challenges. Beyond these issues, masks impede the paralinguistics of interpersonal communication, i.e. the way something was said. For acoustic speech, paralinguistics can be automatically recognized using Computational Paralinguistics methods. Silent Speech Interfaces (SSIs) enable spoken communication even when the acoustic signal is severely degraded or unavailable. SSIs aim to generate speech for silent speakers and otherwise mute individuals from biosignals that result from the speech production process itself. Such speech-related biosignals encompass signals from the articulators, articulatory muscle activity, neural pathways and the brain itself. Surface electromyography (EMG), which captures the activity of the articulatory muscles has been successfully applied to SSIs. With EMG-based SSIs, silently spoken speech is converted into text or directly into audible speech. Despite major advances, the lack of paralinguistics remains a major issue for SSI users. In this proposal, we combine Silent Speech Interfaces with Computational Paralinguistics to lay the foundation for “Silent Paralinguistics (SP)”. SP aims to firstly infer speaker states and traits from speech-related biosignals during silent speech production and secondly to use this inferred paralinguistic information for a more natural SSI-based spoken conversation. We will study politeness and frustration as speaker states, as well as identity and personality as speaker traits. As basis for the development of SP methods, we will record and label data from 100 participants, from whom we will elicit polite speech by including game scenarios and frustration by adding infuriating game elements. Based on these data, we will investigate how well speaker states and traits can be predicted from EMG signals of silently produced speech. To this end, we will study and compare two approaches: direct SP, which predicts traits and states directly from the EMG features, and indirect SP, which first converts EMG to acoustic features and then predicts traits and states from the acoustic features. Furthermore, we will optimize the integration of paralinguistic predictions in SSI to generate the most appropriate acoustic signals. Deep generative models for multi-speaker EMG-to-speech conversion will be conditioned on traits and state predictions, such that the produced acoustic signals reflect the intended affective meaning. An EMG-SSI prototype is established to finally validate whether the SP-enhanced acoustic speech signal improves the usability of spoken communication in terms of naturalness and user acceptance.
语言是人类的一种自然能力,也是使我们成为社会物种的核心部分。将说话者的嘴唇隐藏在面罩后面会降低听者的表现和信心,同时增加感知努力。听力受损和非母语人士面临更大的挑战。除了这些问题之外,面具还阻碍了人际交流的非语言学,即说话的方式。对于声学语音,可以使用计算副语言学方法自动识别非语言学。 静音语音接口(SSI)即使在声学信号严重降级或不可用时也能实现语音通信。SSI的目的是为沉默的说话者生成语音,否则会从语音产生过程本身产生的生物信号中静音。这种与语音相关的生物信号包括来自发音器官、发音肌肉活动、神经通路和大脑本身的信号。表面肌电图(EMG),它捕获的活动关节肌肉已成功地应用于SSI。使用基于EMG的SSI,无声的语音被转换为文本或直接转换为可听语音。尽管取得了重大进展,但缺乏语言学仍然是SSI用户的主要问题。 在这个方案中,我们将联合收割机无声语音接口与计算副语言学结合起来,为无声副语言学(SP)奠定基础。SP的目的是首先推断说话人的状态和特点,从语音相关的生物信号在无声的语音生产,其次使用这个推断的语言信息,一个更自然的SSI为基础的口语对话。 我们将研究礼貌和挫折作为说话者的状态,以及身份和个性作为说话者的特质。作为SP方法开发的基础,我们将记录和标记来自100名参与者的数据,我们将通过添加游戏元素来引发礼貌的演讲,包括游戏场景和挫折。基于这些数据,我们将研究如何以及扬声器的状态和特点可以预测从肌电信号无声产生的讲话。为此,我们将研究和比较两种方法:直接SP,直接从EMG特征预测性状和状态,间接SP,首先将EMG转换为声学特征,然后从声学特征预测性状和状态。 此外,我们将优化SSI中的语言预测的集成,以生成最合适的声学信号。用于多说话者EMG到语音转换的深度生成模型将以特征和状态预测为条件,使得产生的声学信号反映预期的情感意义。建立了一个EMG-SSI原型,以最终验证SP增强的声学语音信号是否在自然度和用户接受度方面提高了口语通信的可用性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr.-Ing. Björn Schuller其他文献

Professor Dr.-Ing. Björn Schuller的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr.-Ing. Björn Schuller', 18)}}的其他基金

Kontextsensitive automatische Erkennung spontaner Sprache mit BLSTM-Netzwerken
使用 BLSTM 网络上下文感知自动识别自发语音
  • 批准号:
    193507010
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
    Research Grants
Nichtnegative Matrix-Faktorisierung zur störrobusten Merkmalsextraktion in der Sprachverarbeitung
语言处理中抗干扰特征提取的非负矩阵分解
  • 批准号:
    168309859
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Research Grants
Agent-based Unsupervised Deep Interactive 0-shot-learning Networks Optimising Machines' Ontological Understanding of Sound (AUDI0NOMOUS)
基于代理的无监督深度交互式 0 次学习网络优化机器对声音的本体理解 (AUDI0NOMOUS)
  • 批准号:
    442218748
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Reinhart Koselleck Projects
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了