More efficient and accurate automatic speech recognition

自动语音识别更高效、准确

基本信息

项目摘要

Current Automatic Speech Recognition (ASR) uses stochastic methods that exclude much of what is known about human speech production and perception. For 30 years, ASR has used Hidden Markov Models (HMMs) and now Deep Neural Networks (DNNs). Both are engineering approaches emphasizing recognition accuracy, but tolerating ever increasing cost (computer memory and processing). NNs have existed for decades, but applications were mostly limited to 3-level multilayer perceptrons, with limited capacity to handle the wide range of variability in speech (sources, channels, speakers, contexts, environments). Despite much increased use of ASR (e.g, Siri, Alexa), performance is still not near human levels, especially for noisy conditions (e.g., many cases where prior model training is limited). Continuing recent DNN ASR research is unlikely to approach acceptable accuracy in many cases unless major changes are made to the methodology. Early ASR methodology in the 1970s used mostly expert-system (ES) approaches, exploiting ideas of how vocal-tract resonances (called formants) related to the phonemes intended by speakers, and focused on the spectral peaks of speech, as this is how the ear filters speech inside the cochlea. In the early 1980s, HMMs took over the ASR field as they were much better at handling variability than simple “if-then” algorithms. Nonetheless, if one could track significant aspects of resonances reliably in poor acoustic conditions (that human listeners handle well), then useful ASR decisions could be made far at lower cost than with recent end-to-end DNN approaches. It is here proposed to combine structural and stochastic information in ASR, exploiting well-known (but, in ASR, little used) knowledge of how humans do speech communication. Another major deficiency of ASR is its lack of use of intonation, despite all evidence that such facilitates human speech communication. Human intonation in speech production (which is clearly exploited by human listeners) has long time ranges, making such information very difficult to track in the current systems that rely on either raw speech (at 8000 samples/s or higher) or 10-ms frames of spectral data. The relative success of modern HMM and DNN approaches show that one can succeed (to a certain level of performance) without using intonation, as many practical speech inputs to ASR are simple and short phrases (and in good quality environments). Nonetheless, proper use of intonation in ASR would surely raise recognition accuracy, just as including language models (LM) into ASR in the 1980s did. We will improve robustness in ASR to common acoustic degradations, have greater efficiency, and exploit intonation. The long-term objective: accurate and efficient ASR, approaching that of human listeners. Short-term objectives: 1) a better spectral measure than filter-bank energies, 2) faster and better adaptation, 3) integrate aspects of intonation.
当前的自动语音识别(ASR)使用随机方法,这些方法排除了有关人类言语产生和感知的许多知识。 30年来,ASR使用了隐藏的马尔可夫模型(HMM)和现在的深神经网络(DNNS)。两者都在强调识别精度的工程方法,但可以容忍成本的增加(计算机记忆和处理)。 NNS已经存在数十年,但是应用程序大部分仅限于3级多层感知器,其能力有限,无法处理语音(来源,渠道,扬声器,扬声器,上下文,环境)的广泛可变性。尽管ASR的使用大大增加了(例如Siri,Alexa),但性能仍然不接近人类水平,尤其是对于噪声条件(例如,许多先前模型培训受到限制的情况)。在许多情况下,除非对方法进行重大更改,否则在许多情况下,继续进行最近的DNN ASR研究不太可能达到可接受的准确性。 1970年代的早期ASR方法论主要采用专家系统(ES)方法,利用了与说话者预期的音素相关的人声共振(称为形式)的想法,并专注于语音峰,因为这是同志内部的早期过滤器语音。在1980年代初期,HMM接管了ASR领域,因为它们在处理可变性方面要比简单的“ IF-then”算法要好得多。但是,如果人们能够在较差的声学条件下可靠地跟踪共鸣的重要方面(人类听众可以很好地处理),那么有用的ASR决策可以比最近的端到端DNN方法更低的成本做出。这里建议将结构和随机信息结合在ASR中,利用知名人士(但在ASR中很少使用)了解人类如何进行语音交流的知识。 ASR的另一个主要缺陷是它缺乏语调,绝望地证明了这种促进人类言语交流的所有证据。语音生产中的人类语调(人类听众清楚地探索了)长期范围,使此类信息在依赖原始语音(以8000个样本/s或更高的方式)或10毫秒光谱数据的当前系统中很难跟踪。现代HMM和DNN的相对成功接近了一个人可以在不使用语调的情况下成功(达到一定水平的表现),因为对ASR的许多实际语音输入都是简单而简短的短语(并且在优质的环境中)。但是,在ASR中正确使用语调肯定会提高识别精度,就像1980年代将语言模型(LM)纳入ASR一样。 我们将改善ASR的鲁棒性,对常见的声学降解,具有更高的效率并利用语调。长期目标:准确有效的ASR,接近人类听众的ASR。短期目标:1)比过滤器银行能量更好的光谱测量,2)更快,更好的适应,3)语调的整合方面。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

OShaughnessy, Douglas其他文献

OShaughnessy, Douglas的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('OShaughnessy, Douglas', 18)}}的其他基金

More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2018
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2016
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More Accurate and Efficient Analysis for Automatic Speech Recognition
更准确、更高效的自动语音识别分析
  • 批准号:
    914-2013
  • 财政年份:
    2013
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Improving basic methods of automatic speech recognition
改进自动语音识别的基本方法
  • 批准号:
    914-2008
  • 财政年份:
    2012
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual

相似国自然基金

基于特征基函数的准确高效电磁散射积分方程方法研究
  • 批准号:
    62371228
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
MC-ICP-MS高效准确测定Pb同位素标准参考物质绝对比值的分析方法研究
  • 批准号:
    42003017
  • 批准年份:
    2020
  • 资助金额:
    24 万元
  • 项目类别:
    青年科学基金项目
面向复杂多样模糊的准确高效图像模糊检测与评估方法
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
PGAI技术分析准确性影响机制及高效信息分析方法的研究
  • 批准号:
    11775113
  • 批准年份:
    2017
  • 资助金额:
    78.0 万元
  • 项目类别:
    面上项目
外周血中循环肿瘤细胞的高效分离与准确、灵敏电化学传感研究
  • 批准号:
    21675128
  • 批准年份:
    2016
  • 资助金额:
    65.0 万元
  • 项目类别:
    面上项目

相似海外基金

Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
  • 批准号:
    RGPIN-2022-04953
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Towards More Efficient and Accurate Deep Learning Models for Segmentation, Classification, and Tracking
建立更高效、更准确的分割、分类和跟踪深度学习模型
  • 批准号:
    DGECR-2022-00416
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Launch Supplement
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2022
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
More efficient and accurate automatic speech recognition
自动语音识别更高效、准确
  • 批准号:
    RGPIN-2018-05226
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
A novel Smart Metal Detector (SMD) to detect and locate real threats (e.g. handguns and knives) without interrupting the normal flow of public, thus leading to far more accurate, efficient and cost-effective security screening
一种新型智能金属探测器(SMD),可在不中断公众正常流动的情况下检测和定位真正的威胁(例如手枪和刀具),从而实现更准确、更高效和更具成本效益的安检
  • 批准号:
    39814
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Study
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了