RI: Medium: Collaborative Research: Multilingual Gestural Models for Robust Language-Independent Speech Recognition

RI:媒介:协作研究:用于鲁棒语言无关语音识别的多语言手势模型

基本信息

  • 批准号:
    1162525
  • 负责人:
  • 金额:
    $ 23.49万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-10-01 至 2016-09-30
  • 项目状态:
    已结题

项目摘要

Current state-of-the-art automatic speech recognition (ASR) systems typically model speech as a string of acoustically-defined phones and use contextualized phone units, such as tri-phones or quin-phones to model contextual influences due to coarticulation. Such acoustic models may suffer from data sparsity and may fail to capture coarticulation appropriately because the span of a tri- or quin-phone's contextual influence is not flexible. In a small vocabulary context, however, research has shown that ASR systems which estimate articulatory gestures from the acoustics and incorporate these gestures in the ASR process can better model coarticulation and are more robust to noise. The current project investigates the use of estimated articulatory gestures in large vocabulary automatic speech recognition. Gestural representations of the speech signal are initially created from the acoustic waveform using the Task Dynamic model of speech production. These data are then used to train automatic models for articulatory gesture recognition where the articulatory gestures serve as subword units in the gesture-based ASR system. The main goal of the proposed work is to evaluate the performance of a large-vocabulary gesture-based ASR system using American English (AE). The gesture-based system will be compared to a set of competitive state-of-the-art recognition systems in term of word and phone recognition accuracies, both under clean and noisy acoustic background conditions.The broad impact of this research is threefold: (1) the creation of a large vocabulary American English (AE) speech database containing acoustic waveforms and their articulatory representations, (2) the introduction of novel machine learning techniques to model articulatory representations from acoustic waveforms, and (3) the development of a large vocabulary ASR system that uses articulatory representation as subword units. The robust and accurate ASR system for AE resulting from the proposed project will deal effectively with speech variability, thereby significantly enhancing communication and collaboration between people and machines in AE, and with the promise to generalize the method to multiple languages. The knowledge gained and the systems developed will contribute to the broad application of articulatory features in speech processing, and will have the potential to transform the fields of ASR, speech-mediated person-machine interaction, and automatic translation among languages. The interdisciplinary collaboration will facilitate a cross-disciplinary learning environment for the participating faculty, researchers, graduate students and undergraduate students Thus, this collaboration will result in the broader impact of enhanced training in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.
当前最先进的自动语音识别(ASR)系统通常将语音建模为一串声学定义的音素,并使用上下文化的音素单元(诸如三音素或五音素)来建模由于协同发音而引起的上下文影响。 这样的声学模型可能遭受数据稀疏性,并且可能无法适当地捕获协同发音,因为三音素或五音素的上下文影响的跨度不灵活。然而,在小词汇量的情况下,研究表明,从声学中估计发音手势并将这些手势纳入ASR过程的ASR系统可以更好地对协同发音进行建模,并且对噪声更具鲁棒性。 目前的项目研究使用估计发音手势在大词汇量的自动语音识别。 语音信号的手势表示最初使用语音产生的任务动态模型从声学波形创建。 这些数据然后被用来训练用于发音姿势识别的自动模型,其中发音姿势在基于姿势的ASR系统中充当子字单元。 所提出的工作的主要目标是使用美国英语(AE)的大词汇量的基于手势的ASR系统的性能进行评估。 基于手势的系统将与一组具有竞争力的最先进的识别系统在单词和电话识别精度方面进行比较,无论是在干净的还是嘈杂的声学背景条件下。这项研究的广泛影响是三方面的:(1)创建包含声学波形及其发音表示的大词汇量美式英语(AE)语音数据库,(2)引入新的机器学习技术来根据声波波形对发音表示进行建模,以及(3)开发使用发音表示作为子字单元的大词汇量ASR系统。 该项目产生的强大而准确的AE ASR系统将有效地处理语音变异,从而显着增强AE中人与机器之间的沟通和协作,并有望将该方法推广到多种语言。 所获得的知识和开发的系统将有助于发音特征在语音处理中的广泛应用,并将有可能改变ASR,语音介导的人机交互和语言之间的自动翻译领域。 跨学科合作将促进参与教师,研究人员,研究生和本科生的跨学科学习环境,因此,这种合作将导致语音建模和算法开发的增强培训的更广泛的影响。 最后,拟议的工作将产生一套数据库和工具,将分发给广大研究和教育界。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Carol Espy-Wilson其他文献

Computationally Scalable and Clinically Sound: Laying the Groundwork to Use Machine Learning Techniques for Social Media and Language Data in Predicting Psychiatric Symptoms
  • DOI:
    10.1016/j.biopsych.2022.02.146
  • 发表时间:
    2022-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Deanna Kelly;Glen Coppersmith;John Dickerson;Carol Espy-Wilson;Hanna Michel;Philip Resnik
  • 通讯作者:
    Philip Resnik

Carol Espy-Wilson的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Carol Espy-Wilson', 18)}}的其他基金

Collaborative Research: Estimating Articulatory Constriction Place and Timing from Speech Acoustics
合作研究:从语音声学估计发音收缩位置和时间
  • 批准号:
    2141413
  • 财政年份:
    2022
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
SCH: INT: Collaborative Research: Using Multi-Stage Learning to Prioritize Mental Health
SCH:INT:协作研究:利用多阶段学习优先考虑心理健康
  • 批准号:
    2124270
  • 财政年份:
    2021
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Speech for Robotics
机器人演讲
  • 批准号:
    1941541
  • 财政年份:
    2019
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: Effects of production variability on the acoustic consequences of coordinated articulatory gestures
合作研究:生产变异性对协调发音姿势的声学结果的影响
  • 批准号:
    1436600
  • 财政年份:
    2014
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
CIF: Small: Nonintrusive Digital Speech Forensics: Source Identification and Content authentication
CIF:小型:非侵入式数字语音取证:源识别和内容身份验证
  • 批准号:
    0917104
  • 财政年份:
    2009
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
RI: Extension of the APP detector for multipitch tracking and speaker separation
RI:APP 检测器的扩展,用于多音高跟踪和扬声器分离
  • 批准号:
    0812509
  • 财政年份:
    2008
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
RI: Collaborative Research: Landmark-based Robust Speech Recognition Using Prosody-Guided Models of Speech Variability
RI:协作研究:使用韵律引导的语音变异模型进行基于地标的鲁棒语音识别
  • 批准号:
    0703859
  • 财政年份:
    2007
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
The Development of Low-Level Speaker-Specific Information for Speaker Recognition
用于说话人识别的低级说话人特定信息的开发
  • 批准号:
    0519256
  • 财政年份:
    2005
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
Acoustic-Phonetic Knowledge and Speech Recognition
声学语音知识和语音识别
  • 批准号:
    0236707
  • 财政年份:
    2003
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
SGER: Exploration of a Neurological Model to Improve the Extraction of Linguistic Features in Speech
SGER:探索神经模型以改进语音中语言特征的提取
  • 批准号:
    0233482
  • 财政年份:
    2002
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312841
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312842
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313151
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312840
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313149
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312374
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312373
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
  • 批准号:
    2312955
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Informed, Fair, Efficient, and Incentive-Aware Group Decision Making
协作研究:RI:媒介:知情、公平、高效和具有激励意识的群体决策
  • 批准号:
    2313137
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313150
  • 财政年份:
    2023
  • 资助金额:
    $ 23.49万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了