权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

More Accurate and Efficient Analysis for Automatic Speech Recognition

更准确、更高效的自动语音识别分析

基本信息

批准号：
914-2013
负责人：
OShaughnessy, Douglas
金额：
$ 2.11万
依托单位：
Institut national de la recherche scientifique
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2015
资助国家：
加拿大
起止时间：
2015-01-01 至 2016-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=568152
关键词：
More Accurate Efficient Analysis Automatic

项目摘要

Efficient communicating with machines via voice will facilitate interactions that so far have been hindered by awkward interfaces such as keyboards and telephone keypads. People nowadays have increasing need to interact with computers, yet this is still done mostly by typing. Even attempts to seek information by telephone often require many cycles of listening to long messages and then pushing a button, because our capability to do reliable automatic speech recognition (ASR) is quite limited. For some time now, refinements to basic ASR methods established years ago have improved performance, without radical changes to the basic approaches. The recent vast increase in computational power and memory size has led researchers to attack increasingly difficult tasks such as continuously-spoken, very-large-vocabulary, speaker-independent, noisy speech over the telephone. For some limited tasks, e.g., recognizing credit card numbers, or words drawn from medium-sized vocabularies and spoken with frequent pauses, recognition accuracy rises above 99%, and hence practical commercial products are available. However, progress has been slow for the more difficult tasks of recognizing conversational speech or noisy speech. Furthermore, recognition error rates remain high when generalized speaker-independent models are used to decode speakers not used in the training phase. The theme of our proposed work, increasing robustness of ASR, refers to the tendency for error rates to increase when sounds other than the desired speech corrupt the input signal or when speakers not used in training use the system. It is our view that a major factor in raising robustness concerns the inadequacies of the current spectral analysis methods. We propose to replace current analysis methods with a more appropriate technique that resists corruption by noise, and will further allow more efficient ways to adapt the ASR models to each new speaker's voice, including speech with significant accents. Our research, if successful, would lead to significantly improved ASR performance, both in increased accuracy and decreased computation. It will lead eventually to a much more agreeable way for anyone to interact with computers.

通过语音与机器进行高效沟通，将促进迄今因键盘和电话小键盘等笨拙的界面而受到阻碍的互动。如今，人们越来越需要与计算机互动，但这主要还是通过打字来完成的。即使是试图通过电话寻找信息，也往往需要听完长消息，然后按下按钮，因为我们进行可靠的自动语音识别(ASR)的能力相当有限。一段时间以来，多年前建立的基本ASR方法的改进提高了性能，而基本方法没有根本改变。最近计算能力和内存大小的巨大增长导致研究人员解决了越来越困难的任务，例如连续说话、非常大的词汇量、不依赖于说话人的电话语音和噪声。对于一些有限的任务，例如识别信用卡号，或者从中等大小的词汇中提取的频繁停顿的单词，识别准确率提高到99%以上，因此可以获得实用的商业产品。然而，在识别会话语音或嘈杂语音这一更困难的任务上，进展一直很缓慢。此外，当使用广义的说话人无关模型来解码训练阶段不使用的说话人时，识别错误率仍然很高。我们拟议工作的主题是增加ASR的稳健性，指的是当所需语音以外的声音破坏输入信号或当未被训练的说话人使用系统时，错误率有增加的趋势。我们认为，提高稳健性的一个主要因素涉及当前光谱分析方法的不足。我们建议用一种更合适的技术来取代目前的分析方法，这种技术可以抵抗噪声的破坏，并将进一步允许更有效的方法来使ASR模型适应每个新说话人的声音，包括带有明显口音的语音。我们的研究，如果成功，将导致ASR性能的显著改善，无论是在提高准确性和减少计算量方面。它最终将为任何人带来一种更令人愉快的与计算机交互的方式。