权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

More Accurate and Efficient Analysis for Automatic Speech Recognition

更准确、更高效的自动语音识别分析

基本信息

批准号：
914-2013
负责人：
OShaughnessy, Douglas
金额：
$ 2.11万
依托单位：
Institut national de la recherche scientifique
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2017
资助国家：
加拿大
起止时间：
2017-01-01 至 2018-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=637720
关键词：
More Accurate Efficient Analysis Automatic

项目摘要

Efficient communicating with machines via voice will facilitate interactions that so far have been hindered by awkward interfaces such as keyboards and telephone keypads. People nowadays have increasing need to interact with computers, yet this is still done mostly by typing. Even attempts to seek information by telephone often require many cycles of listening to long messages and then pushing a button, because our capability to do reliable automatic speech recognition (ASR) is quite limited. For some time now, refinements to basic ASR methods established years ago have improved performance, without radical changes to the basic approaches. The recent vast increase in computational power and memory size has led researchers to attack increasingly difficult tasks such as continuously-spoken, very-large-vocabulary, speaker-independent, noisy speech over the telephone. For some limited tasks, e.g., recognizing credit card numbers, or words drawn from medium-sized vocabularies and spoken with frequent pauses, recognition accuracy rises above 99%, and hence practical commercial products are available. However, progress has been slow for the more difficult tasks of recognizing conversational speech or noisy speech. Furthermore, recognition error rates remain high when generalized speaker-independent models are used to decode speakers not used in the training phase. The theme of our proposed work, increasing robustness of ASR, refers to the tendency for error rates to increase when sounds other than the desired speech corrupt the input signal or when speakers not used in training use the system. It is our view that a major factor in raising robustness concerns the inadequacies of the current spectral analysis methods. We propose to replace current analysis methods with a more appropriate technique that resists corruption by noise, and will further allow more efficient ways to adapt the ASR models to each new speaker's voice, including speech with significant accents. Our research, if successful, would lead to significantly improved ASR performance, both in increased accuracy and decreased computation. It will lead eventually to a much more agreeable way for anyone to interact with computers.

通过语音与机器进行有效的沟通将促进迄今为止一直受到键盘和电话机等笨拙界面阻碍的互动。现在人们越来越需要与计算机进行交互，但这仍然主要通过打字来完成。即使是试图通过电话寻求信息，也往往需要听很长的信息，然后按下按钮，因为我们进行可靠的自动语音识别（ASR）的能力非常有限。一段时间以来，对多年前建立的基本ASR方法的改进提高了性能，而没有对基本方法进行根本性的改变。最近计算能力和内存大小的大幅增加导致研究人员攻击越来越困难的任务，例如连续说话，非常大的词汇量，说话者独立，通过电话进行嘈杂的语音。对于某些有限的任务，例如，识别信用卡号或从中等词汇量中提取的单词并且以频繁的停顿说出，识别准确率上升到99%以上，因此可获得实用的商业产品。然而，对于识别会话语音或噪声语音的更困难的任务，进展缓慢。此外，识别错误率仍然很高时，广义说话人独立模型用于解码扬声器不使用的训练阶段。我们提出的工作的主题，增加ASR的鲁棒性，是指错误率增加的趋势时，声音比所需的语音损坏输入信号或当扬声器不用于训练使用系统。我们认为，提高鲁棒性的一个主要因素是目前频谱分析方法的不足之处。我们建议用一种更合适的技术来取代当前的分析方法，这种技术可以抵抗噪声的破坏，并将进一步允许更有效的方法来使ASR模型适应每个新说话者的声音，包括带有显著口音的语音。我们的研究如果成功，将导致ASR性能的显着改善，无论是在提高准确性和减少计算。它最终将导致一种更令人愉快的方式，让任何人都能与计算机进行交互。