More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
基本信息
- 批准号:RGPIN-2018-05226
- 负责人:
- 金额:$ 2.04万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Current Automatic Speech Recognition (ASR) uses stochastic methods that exclude much of what is known about human speech production and perception. For 30 years, ASR has used Hidden Markov Models (HMMs) and now Deep Neural Networks (DNNs). Both are engineering approaches emphasizing recognition accuracy, but tolerating ever increasing cost (computer memory and processing). NNs have existed for decades, but applications were mostly limited to 3-level multilayer perceptrons, with limited capacity to handle the wide range of variability in speech (sources, channels, speakers, contexts, environments). Despite much increased use of ASR (e.g, Siri, Alexa), performance is still not near human levels, especially for noisy conditions (e.g., many cases where prior model training is limited). Continuing recent DNN ASR research is unlikely to approach acceptable accuracy in many cases unless major changes are made to the methodology.******Early ASR methodology in the 1970s used mostly expert-system (ES) approaches, exploiting ideas of how vocal-tract resonances (called formants) related to the phonemes intended by speakers, and focused on the spectral peaks of speech, as this is how the ear filters speech inside the cochlea. In the early 1980s, HMMs took over the ASR field as they were much better at handling variability than simple “if-then” algorithms. Nonetheless, if one could track significant aspects of resonances reliably in poor acoustic conditions (that human listeners handle well), then useful ASR decisions could be made far at lower cost than with recent end-to-end DNN approaches. It is here proposed to combine structural and stochastic information in ASR, exploiting well-known (but, in ASR, little used) knowledge of how humans do speech communication.******Another major deficiency of ASR is its lack of use of intonation, despite all evidence that such facilitates human speech communication. Human intonation in speech production (which is clearly exploited by human listeners) has long time ranges, making such information very difficult to track in the current systems that rely on either raw speech (at 8000 samples/s or higher) or 10-ms frames of spectral data. The relative success of modern HMM and DNN approaches show that one can succeed (to a certain level of performance) without using intonation, as many practical speech inputs to ASR are simple and short phrases (and in good quality environments). Nonetheless, proper use of intonation in ASR would surely raise recognition accuracy, just as including language models (LM) into ASR in the 1980s did. ******We will improve robustness in ASR to common acoustic degradations, have greater efficiency, and exploit intonation. The long-term objective: accurate and efficient ASR, approaching that of human listeners. Short-term objectives: 1) a better spectral measure than filter-bank energies, 2) faster and better adaptation, 3) integrate aspects of intonation.
目前的自动语音识别(ASR)使用随机方法,排除了许多已知的人类语音产生和感知。30年来,ASR一直使用隐马尔可夫模型(hmm)和现在的深度神经网络(dnn)。两者都是强调识别准确性的工程方法,但容忍不断增加的成本(计算机内存和处理)。神经网络已经存在了几十年,但应用大多局限于3级多层感知器,处理语音(源、通道、说话者、上下文、环境)的大范围可变性的能力有限。尽管ASR的使用大大增加(例如Siri, Alexa),但其性能仍未接近人类水平,特别是在嘈杂的条件下(例如,许多先前模型训练有限的情况下)。在许多情况下,持续的DNN ASR研究不太可能接近可接受的准确性,除非对方法进行重大改变。****** 20世纪70年代早期的ASR方法主要使用专家系统(ES)方法,利用声道共振(称为共振峰)与说话者意图的音素之间的关系,并将重点放在语音的频谱峰上,因为这是耳朵在耳蜗内过滤语音的方式。在20世纪80年代早期,hmm接管了ASR领域,因为它们比简单的“如果-那么”算法更擅长处理变异性。尽管如此,如果一个人能够在糟糕的声学条件下可靠地跟踪共振的重要方面(人类听众可以很好地处理),那么有用的ASR决策就可以以比最近的端到端深度神经网络方法低得多的成本做出。本文建议将ASR中的结构信息和随机信息结合起来,利用众所周知的(但在ASR中很少使用的)人类如何进行语音交流的知识。****** ASR的另一个主要缺陷是缺乏对语调的使用,尽管所有证据都表明语调有助于人类的语言交流。语音生产中的人类语调(显然是由人类听众利用的)具有很长的时间范围,使得这些信息在当前依赖于原始语音(8000个样本/秒或更高)或10毫秒频谱数据帧的系统中非常难以跟踪。现代HMM和深度神经网络方法的相对成功表明,人们可以在不使用语调的情况下取得成功(达到一定的性能水平),因为ASR的许多实际语音输入都是简单和简短的短语(并且在良好的质量环境中)。尽管如此,在ASR中正确使用语调肯定会提高识别的准确性,就像20世纪80年代将语言模型(LM)纳入ASR一样。******我们将提高ASR对常见声学退化的鲁棒性,提高效率,并利用语调。长期目标:准确和高效的ASR,接近人类听众。短期目标:1)比滤波器组能量更好的频谱测量,2)更快更好的自适应,3)整合语调的各个方面。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
OShaughnessy, Douglas其他文献
OShaughnessy, Douglas的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('OShaughnessy, Douglas', 18)}}的其他基金
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2017
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2016
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2014
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2013
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Improving basic methods of automatic speech recognition
改进自动语音识别的基本方法
- 批准号:
914-2008 - 财政年份:2012
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
固定参数可解算法在平面图问题的应用以及和整数线性规划的关系
- 批准号:60973026
- 批准年份:2009
- 资助金额:32.0 万元
- 项目类别:面上项目
相似海外基金
Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
- 批准号:
RGPIN-2022-04953 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
- 批准号:
DGECR-2022-00416 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Launch Supplement
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
A novel Smart Metal Detector (SMD) to detect and locate real threats (e.g. handguns and knives) without interrupting the normal flow of public, thus leading to far more accurate, efficient and cost-effective security screening
一种新型智能金属探测器(SMD),可在不中断公众正常流动的情况下检测和定位真正的威胁(例如手枪和刀具),从而实现更准确、更高效和更具成本效益的安检
- 批准号:
39814 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Study
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
- 批准号:
RGPIN-2018-05226 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2017
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2016
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
- 批准号:
914-2013 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual