权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Multilingual Gestural Models for Robust Language-Independent Speech Recognition

RI：媒介：协作研究：用于鲁棒语言无关语音识别的多语言手势模型

基本信息

批准号：
1162525
负责人：
Carol Espy-Wilson
金额：
$ 23.49万
依托单位：
University of Maryland, College Park
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-10-01 至 2016-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1162525&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Multilingual

项目摘要

Current state-of-the-art automatic speech recognition (ASR) systems typically model speech as a string of acoustically-defined phones and use contextualized phone units, such as tri-phones or quin-phones to model contextual influences due to coarticulation. Such acoustic models may suffer from data sparsity and may fail to capture coarticulation appropriately because the span of a tri- or quin-phone's contextual influence is not flexible. In a small vocabulary context, however, research has shown that ASR systems which estimate articulatory gestures from the acoustics and incorporate these gestures in the ASR process can better model coarticulation and are more robust to noise. The current project investigates the use of estimated articulatory gestures in large vocabulary automatic speech recognition. Gestural representations of the speech signal are initially created from the acoustic waveform using the Task Dynamic model of speech production. These data are then used to train automatic models for articulatory gesture recognition where the articulatory gestures serve as subword units in the gesture-based ASR system. The main goal of the proposed work is to evaluate the performance of a large-vocabulary gesture-based ASR system using American English (AE). The gesture-based system will be compared to a set of competitive state-of-the-art recognition systems in term of word and phone recognition accuracies, both under clean and noisy acoustic background conditions.The broad impact of this research is threefold: (1) the creation of a large vocabulary American English (AE) speech database containing acoustic waveforms and their articulatory representations, (2) the introduction of novel machine learning techniques to model articulatory representations from acoustic waveforms, and (3) the development of a large vocabulary ASR system that uses articulatory representation as subword units. The robust and accurate ASR system for AE resulting from the proposed project will deal effectively with speech variability, thereby significantly enhancing communication and collaboration between people and machines in AE, and with the promise to generalize the method to multiple languages. The knowledge gained and the systems developed will contribute to the broad application of articulatory features in speech processing, and will have the potential to transform the fields of ASR, speech-mediated person-machine interaction, and automatic translation among languages. The interdisciplinary collaboration will facilitate a cross-disciplinary learning environment for the participating faculty, researchers, graduate students and undergraduate students Thus, this collaboration will result in the broader impact of enhanced training in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.

当前最先进的自动语音识别（ASR）系统通常将语音建模为一串声学定义的音素，并使用上下文化的音素单元（诸如三音素或五音素）来建模由于协同发音而引起的上下文影响。这样的声学模型可能遭受数据稀疏性，并且可能无法适当地捕获协同发音，因为三音素或五音素的上下文影响的跨度不灵活。然而，在小词汇量的情况下，研究表明，从声学中估计发音手势并将这些手势纳入ASR过程的ASR系统可以更好地对协同发音进行建模，并且对噪声更具鲁棒性。目前的项目研究使用估计发音手势在大词汇量的自动语音识别。语音信号的手势表示最初使用语音产生的任务动态模型从声学波形创建。这些数据然后被用来训练用于发音姿势识别的自动模型，其中发音姿势在基于姿势的ASR系统中充当子字单元。所提出的工作的主要目标是使用美国英语（AE）的大词汇量的基于手势的ASR系统的性能进行评估。基于手势的系统将与一组具有竞争力的最先进的识别系统在单词和电话识别精度方面进行比较，无论是在干净的还是嘈杂的声学背景条件下。这项研究的广泛影响是三方面的：（1）创建包含声学波形及其发音表示的大词汇量美式英语（AE）语音数据库，（2）引入新的机器学习技术来根据声波波形对发音表示进行建模，以及（3）开发使用发音表示作为子字单元的大词汇量ASR系统。该项目产生的强大而准确的AE ASR系统将有效地处理语音变异，从而显着增强AE中人与机器之间的沟通和协作，并有望将该方法推广到多种语言。所获得的知识和开发的系统将有助于发音特征在语音处理中的广泛应用，并将有可能改变ASR，语音介导的人机交互和语言之间的自动翻译领域。跨学科合作将促进参与教师，研究人员，研究生和本科生的跨学科学习环境，因此，这种合作将导致语音建模和算法开发的增强培训的更广泛的影响。最后，拟议的工作将产生一套数据库和工具，将分发给广大研究和教育界。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Carol Espy-Wilson其他文献

Computationally Scalable and Clinically Sound: Laying the Groundwork to Use Machine Learning Techniques for Social Media and Language Data in Predicting Psychiatric Symptoms

DOI：
10.1016/j.biopsych.2022.02.146
发表时间：
2022-05-01
期刊：
Conference abstract
影响因子：
作者：
Deanna Kelly;Glen Coppersmith;John Dickerson;Carol Espy-Wilson;Hanna Michel;Philip Resnik
通讯作者：
Philip Resnik