权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Modeling Coarticulation for Automatic Speech Recognition

RI：小型：自动语音识别的协同发音建模

基本信息

批准号：
0915754
负责人：
Alexander Kain
金额：
$ 45万
依托单位：
Oregon Health & Science University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2009
资助国家：
美国
起止时间：
2009-09-01 至 2013-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0915754&HistoricalAwards=false
关键词：
RI Small Modeling Coarticulation Automatic

项目摘要

This project focuses on applying a model used in text-to-speech synthesis (TTS) to the task of automatic speech recognition (ASR). The standard method in ASR for addressing variability due to phonemic context, or ?coarticulation,? requires a large amount of training data and is sensitive to differences between training and testing conditions. Despite the effective use of stochastic models, current ASR systems are often unable to sufficiently account for the large degree of variability observed in speech. In many cases, this variability is not due to random factors, but is due to predictable changes in the speech signal. These factors are currently modeled in order to generate speech via TTS, but they are not yet modeled in order to recognize speech, largely because of non-local dependencies. We apply the Asynchronous Interpolation Model (AIM) used in TTS to the task of speech recognition, by decomposing the speech signal into target vectors and weight trajectories, and then searching weight-trajectory and stochastic target-vector models for the highest-probability match to the input signal. The goal of this research is improve the robustness of ASR to variability that is due to phonemic and lexical context. This improvement will increase the use of ASR technology in automated information access by telephone, educational software, and universal access for individuals with visual, auditory, or speech-production challenges. More effective models of coarticulation may increase our understanding of both human speech perception and speech production. Results from this project are disseminated through technical papers and the CSLU Toolkit software package.

本计画的重点是将文语合成（TTS）中使用的模型应用于自动语音辨识（ASR）的任务。 ASR中用于解决由于音素上下文引起的变异性的标准方法，还是？协同发音？需要大量的训练数据，并且对训练和测试条件之间的差异敏感。尽管有效地使用随机模型，目前的ASR系统往往无法充分考虑到语音中观察到的大程度的变化。在许多情况下，这种可变性不是由于随机因素，而是由于语音信号中的可预测变化。这些因素目前被建模，以便通过TTS生成语音，但它们还没有被建模，以便识别语音，主要是因为非本地依赖性。我们应用异步插值模型（AIM）在TTS语音识别的任务，通过将语音信号分解成目标向量和权重轨迹，然后搜索权重轨迹和随机目标向量模型的最高概率匹配的输入信号。本研究的目的是提高语音识别对语音和词汇语境变化的鲁棒性。这一改进将增加ASR技术在通过电话、教育软件和具有视觉、听觉或语音产生挑战的个人的普遍访问的自动信息访问中的使用。更有效的协同发音模型可能会增加我们对人类语音感知和语音产生的理解。该项目的成果通过技术文件和CSLU工具包软件包传播。