More efficient and accurate automatic speech recognition

自动语音识别更高效、准确

基本信息

项目摘要

Current Automatic Speech Recognition (ASR) uses stochastic methods that exclude much of what is known about human speech production and perception. For 30 years, ASR has used Hidden Markov Models (HMMs) and now Deep Neural Networks (DNNs). Both are engineering approaches emphasizing recognition accuracy, but tolerating ever increasing cost (computer memory and processing). NNs have existed for decades, but applications were mostly limited to 3-level multilayer perceptrons, with limited capacity to handle the wide range of variability in speech (sources, channels, speakers, contexts, environments). Despite much increased use of ASR (e.g, Siri, Alexa), performance is still not near human levels, especially for noisy conditions (e.g., many cases where prior model training is limited). Continuing recent DNN ASR research is unlikely to approach acceptable accuracy in many cases unless major changes are made to the methodology. Early ASR methodology in the 1970s used mostly expert-system (ES) approaches, exploiting ideas of how vocal-tract resonances (called formants) related to the phonemes intended by speakers, and focused on the spectral peaks of speech, as this is how the ear filters speech inside the cochlea. In the early 1980s, HMMs took over the ASR field as they were much better at handling variability than simple “if-then” algorithms. Nonetheless, if one could track significant aspects of resonances reliably in poor acoustic conditions (that human listeners handle well), then useful ASR decisions could be made far at lower cost than with recent end-to-end DNN approaches. It is here proposed to combine structural and stochastic information in ASR, exploiting well-known (but, in ASR, little used) knowledge of how humans do speech communication. Another major deficiency of ASR is its lack of use of intonation, despite all evidence that such facilitates human speech communication. Human intonation in speech production (which is clearly exploited by human listeners) has long time ranges, making such information very difficult to track in the current systems that rely on either raw speech (at 8000 samples/s or higher) or 10-ms frames of spectral data. The relative success of modern HMM and DNN approaches show that one can succeed (to a certain level of performance) without using intonation, as many practical speech inputs to ASR are simple and short phrases (and in good quality environments). Nonetheless, proper use of intonation in ASR would surely raise recognition accuracy, just as including language models (LM) into ASR in the 1980s did. We will improve robustness in ASR to common acoustic degradations, have greater efficiency, and exploit intonation. The long-term objective: accurate and efficient ASR, approaching that of human listeners. Short-term objectives: 1) a better spectral measure than filter-bank energies, 2) faster and better adaptation, 3) integrate aspects of intonation.
目前的自动语音识别(ASR)使用随机方法,排除了许多关于人类语音产生和感知的已知信息。30年来,ASR一直使用隐马尔可夫模型(HMM),现在使用深度神经网络(DNN)。这两种方法都是工程方法,强调识别的准确性,但容忍不断增加的成本(计算机内存和处理)。神经网络已经存在了几十年,但应用大多局限于三级多层感知器,处理语音(源、通道、说话人、上下文、环境)的广泛可变性的能力有限。尽管ASR的使用大大增加(例如,Siri、Alexa),但性能仍然没有接近人类的水平,特别是在嘈杂的条件下(例如,许多以前的模型训练有限的情况)。继续最近的DNN ASR研究在许多情况下不太可能接近可接受的精度,除非对方法进行重大改变。 20世纪70年代早期的ASR方法主要使用专家系统(ES)方法,利用声道共振(称为共振峰)如何与说话者想要的音素相关的想法,并专注于语音的频谱峰值,因为这是耳朵如何过滤耳蜗内的语音。20世纪80年代初,隐马尔可夫模型接管了ASR领域,因为它们在处理可变性方面比简单的“如果-那么”算法要好得多。尽管如此,如果一个人能够在恶劣的声学条件下可靠地跟踪共振的重要方面(人类听众处理得很好),那么有用的ASR决策可以比最近的端到端DNN方法低得多的成本做出。本文提出在ASR中结合结构信息和随机信息,利用众所周知的(但在ASR中很少使用)人类如何进行语音交流的知识。 ASR的另一个主要缺陷是它缺乏语调的使用,尽管所有的证据都表明这有助于人类的言语交流。语音产生中的人类语调(显然被人类听众利用)具有很长的时间范围,使得此类信息在当前依赖于原始语音(8000个样本/S或更高)或10ms帧频谱数据的系统中非常难以跟踪。现代HMM和DNN方法的相对成功表明,一个人可以在不使用语调的情况下成功(达到一定程度的性能),因为ASR的许多实际语音输入都是简单而简短的短语(并且在良好的质量环境中)。尽管如此,在ASR中正确使用语调肯定会提高识别准确率,就像20世纪80年代将语言模型(LM)纳入ASR所做的那样。 我们将提高ASR对常见声学退化的稳健性,具有更高的效率,并利用语调。长期目标:准确、高效的ASR,接近人类听众的水平。短期目标:1)比滤波器组能量更好的频谱测量,2)更快更好的适应,3)语调的综合方面。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

OShaughnessy, Douglas其他文献

OShaughnessy, Douglas的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('OShaughnessy, Douglas', 18)}}的其他基金

More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2018
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2016
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2013
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Improving basic methods of automatic speech recognition
改进自动语音识别的基本方法
  • 批准号:
    914-2008
  • 财政年份:
    2012
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual

相似国自然基金

固定参数可解算法在平面图问题的应用以及和整数线性规划的关系
  • 批准号:
    60973026
  • 批准年份:
    2009
  • 资助金额:
    32.0 万元
  • 项目类别:
    面上项目

相似海外基金

Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
  • 批准号:
    RGPIN-2022-04953
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
  • 批准号:
    DGECR-2022-00416
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Launch Supplement
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
A novel Smart Metal Detector (SMD) to detect and locate real threats (e.g. handguns and knives) without interrupting the normal flow of public, thus leading to far more accurate, efficient and cost-effective security screening
一种新型智能金属探测器(SMD),可在不中断公众正常流动的情况下检测和定位真正的威胁(例如手枪和刀具),从而实现更准确、更高效和更具成本效益的安检
  • 批准号:
    39814
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Study
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2018
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2016
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了