权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Multilingual Gestural Models for Robust Language-Independent Speech Recognition

RI：媒介：协作研究：用于鲁棒语言无关语音识别的多语言手势模型

基本信息

批准号：
1162033
负责人：
Elliot Saltzman
金额：
$ 5.26万
依托单位：
Trustees of Boston University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-10-01 至 2017-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1162033&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Multilingual

项目摘要

Current state-of-the-art automatic speech recognition (ASR) systems typically model speech as a string of acoustically-defined phones and use contextualized phone units, such as tri-phones or quin-phones to model contextual influences due to coarticulation. Such acoustic models may suffer from data sparsity and may fail to capture coarticulation appropriately because the span of a tri- or quin-phone's contextual influence is not flexible. In a small vocabulary context, however, research has shown that ASR systems which estimate articulatory gestures from the acoustics and incorporate these gestures in the ASR process can better model coarticulation and are more robust to noise. The current project investigates the use of estimated articulatory gestures in large vocabulary automatic speech recognition. Gestural representations of the speech signal are initially created from the acoustic waveform using the Task Dynamic model of speech production. These data are then used to train automatic models for articulatory gesture recognition where the articulatory gestures serve as subword units in the gesture-based ASR system. The main goal of the proposed work is to evaluate the performance of a large-vocabulary gesture-based ASR system using American English (AE). The gesture-based system will be compared to a set of competitive state-of-the-art recognition systems in term of word and phone recognition accuracies, both under clean and noisy acoustic background conditions.The broad impact of this research is threefold: (1) the creation of a large vocabulary American English (AE) speech database containing acoustic waveforms and their articulatory representations, (2) the introduction of novel machine learning techniques to model articulatory representations from acoustic waveforms, and (3) the development of a large vocabulary ASR system that uses articulatory representation as subword units. The robust and accurate ASR system for AE resulting from the proposed project will deal effectively with speech variability, thereby significantly enhancing communication and collaboration between people and machines in AE, and with the promise to generalize the method to multiple languages. The knowledge gained and the systems developed will contribute to the broad application of articulatory features in speech processing, and will have the potential to transform the fields of ASR, speech-mediated person-machine interaction, and automatic translation among languages. The interdisciplinary collaboration will facilitate a cross-disciplinary learning environment for the participating faculty, researchers, graduate students and undergraduate students Thus, this collaboration will result in the broader impact of enhanced training in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.

当前最先进的自动语音识别（ASR）系统通常将语音建模为一串声学定义的电话，并使用上下文化的电话单元，如三电话或双电话，来模拟由于协同发音而产生的上下文影响。这种声学模型可能会受到数据稀疏的影响，并且可能无法适当地捕捉协同发音，因为三或四手机的上下文影响范围不灵活。然而，在词汇量较少的环境中，研究表明，从声学中估计发音手势并将这些手势纳入ASR过程的ASR系统可以更好地模拟协同发音，并且对噪声更具鲁棒性。目前的项目研究了估计发音手势在大词汇自动语音识别中的使用。语音信号的手势表示最初是使用语音生成的任务动态模型从声学波形中创建的。这些数据随后被用于训练发音手势识别的自动模型，其中发音手势作为基于手势的ASR系统的子词单位。本研究的主要目的是评估一个基于大词汇量手势的美国英语ASR系统的性能。在干净和嘈杂的声学背景条件下，基于手势的系统将与一组具有竞争力的最先进的识别系统在单词和电话识别精度方面进行比较。本研究的广泛影响有三个方面：(1)创建了一个包含声波波形及其发音表示的大词汇美式英语（AE）语音数据库，(2)引入了新的机器学习技术来模拟声波波形的发音表示，以及(3)开发了一个使用发音表示作为子词单位的大词汇ASR系统。本课题所提出的针对声发射的鲁棒准确的ASR系统将有效地处理语音变异性，从而显著增强声发射中人与机器之间的沟通和协作，并有望将该方法推广到多种语言。所获得的知识和开发的系统将有助于发音特征在语音处理中的广泛应用，并将有可能改变语音识别、语音介导的人机交互和语言间自动翻译等领域。跨学科合作将为参与的教师、研究人员、研究生和本科生提供跨学科的学习环境，因此，这种合作将在语音建模和算法开发方面产生更广泛的影响。最后，拟议的工作将产生一套数据库和工具，这些数据库和工具将被分发给整个研究和教育界。