CAREER: Integrating perceptual models of auditory importance into deep learning-based noise-robust speech recognition

职业:将听觉重要性的感知模型集成到基于深度学习的抗噪声语音识别中

基本信息

  • 批准号:
    1750383
  • 负责人:
  • 金额:
    $ 49.72万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-06-01 至 2023-07-31
  • 项目状态:
    已结题

项目摘要

Hearing is central to human interaction, but the hearing process is not easily observed. The objective of this project is to train models to identify portions of speech utterances that are important to their being correctly identified by human listeners, and to use predictions from these models to make automatic speech recognition (ASR) systems more noise robust by focusing on those regions. The ability to identify important regions of an utterance could significantly advance our understanding of healthy and impaired hearing. Improvements in automatic speech recognition would have broader impacts on the 260 million Americans who use smart phones and the $100 billion ASR industry. The educational portion of this project utilizes examples from speech, language, audio, and music processing to attract and retain students in Brooklyn College's introductory programming course serving a diverse student body along with similar efforts at affiliated high school programs.The team's preliminary results have shown that that some regions of an utterance are more important or useful than others in identifying it by measuring the intelligibility of a given utterance in many different noisy mixtures. This project expands upon these preliminary results in three ways. First it measures ASR auditory importance using the team's existing slow but accurate technique involving random "bubble noise", comparing different ASR variants to each other and to human listeners. Second, it trains a model to predict ASR auditory importance from clean speech using a novel architecture called the bubble cooperative network (BCN) that allows the recognizer to be trained jointly with the BCN to improve performance. Third, it adapts the learned importance predictor to human listeners and uses this human-adapted importance predictor to further refine the ASR models. These tasks should permit the use of utterance-level human responses to directly improve the noise robustness of automatic speech recognition.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
听觉是人类互动的核心,但听觉过程不容易观察。 该项目的目标是训练模型来识别语音话语中对人类听众正确识别至关重要的部分,并使用这些模型的预测来使自动语音识别(ASR)系统通过关注这些区域而更具噪声鲁棒性。 识别话语的重要区域的能力可以大大提高我们对健康和受损听力的理解。 自动语音识别的改进将对2.6亿使用智能手机的美国人和1000亿美元的ASR行业产生更广泛的影响。 该项目的教育部分利用了语音,语言,音频,和音乐处理,以吸引和留住学生在布鲁克林学院的入门编程课程,服务于一个多元化的学生团体沿着类似的努力,在附属高中课程。该小组的初步结果表明,一些地区的话语是更重要或更有用的比其他地区识别它通过测量的可理解性,在许多不同的噪声混合物中的给定话语。 本项目从三个方面扩展了这些初步结果。 首先,它使用该团队现有的缓慢但准确的技术来测量ASR听觉重要性,该技术涉及随机“气泡噪声”,将不同的ASR变体相互比较并与人类听众进行比较。 其次,它训练一个模型来预测ASR听觉重要性,从干净的语音使用一种新的架构称为气泡合作网络(BCN),允许识别器与BCN联合训练,以提高性能。 第三,它使学习的重要性预测器适应人类听众,并使用这种人类适应的重要性预测器来进一步改进ASR模型。 这些任务应该允许使用话语水平的人类反应,以直接提高自动语音识别的噪声鲁棒性。该奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Importantaug: A Data Augmentation Agent for Speech
Large scale evaluation of importance maps in automatic speech recognition
  • DOI:
    10.21437/interspeech.2020-2883
  • 发表时间:
    2020-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    V. Trinh;Michael I. Mandel
  • 通讯作者:
    V. Trinh;Michael I. Mandel
Bubble Cooperative Networks for Identifying Important Speech Cues
用于识别重要语音提示的气泡合作网络
  • DOI:
    10.21437/interspeech.2018-2377
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Trinh, Viet Anh;McFee, Brian;Mandel, Michael I
  • 通讯作者:
    Mandel, Michael I
Directly Comparing the Listening Strategies of Humans and Machines
The Bubble Noise Technique for Speech Perception Research
用于语音感知研究的气泡噪声技术
  • DOI:
    10.1044/2019_pers-19-00058
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mandel, Michael I.;Grover, Vikas;Zhao, Mengxuan;Choi, Jiyoung;Shafer, Valerie L.
  • 通讯作者:
    Shafer, Valerie L.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Michael Mandel其他文献

Gestural Query Specification
手势查询规范
Prognostic Role of Serum Procalcitonin Levels in Hospitalized Patients With Malignant Pleural Effusion
  • DOI:
    10.1016/s2152-2650(24)00968-6
  • 发表时间:
    2024-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Chioma Nwachukwu;Dalia Zubidat;Chelsea Brown;Ifeoma Achebe;Michael Mandel
  • 通讯作者:
    Michael Mandel
Using Prosody to Improve Dependency Parsing
使用 Prosody 改进依存解析
Two Faces of Progressive Dyspnea
  • DOI:
    10.1378/chest.117.5.1500
  • 发表时间:
    2000-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Steven M. Kawut;Michael Mandel;Selim M. Arcasoy
  • 通讯作者:
    Selim M. Arcasoy
MM-429 Prognostic Role of Serum Procalcitonin Levels in Hospitalized Patients With Malignant Pleural Effusion
  • DOI:
    10.1016/s2152-2650(24)01681-1
  • 发表时间:
    2024-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Chioma Nwachukwu;Dalia Zubidat;Chelsea Brown;Ifeoma Achebe;Michael Mandel
  • 通讯作者:
    Michael Mandel

Michael Mandel的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Michael Mandel', 18)}}的其他基金

RI: Small: Concatenative Resynthesis for Very High Quality Speech Enhancement
RI:小:用于非常高质量语音增强的串联再合成
  • 批准号:
    1618061
  • 财政年份:
    2016
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Continuing Grant
Is Local Government Representative? a Study of Attitudes
是地方政府代表吗?
  • 批准号:
    7905335
  • 财政年份:
    1979
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Standard Grant

相似海外基金

Soft robot hand with texture recognition capability approaching that of humans by integrating polymer photoengineering and perceptual processing mechanisms
通过集成聚合物光工程和感知处理机制,具有接近人类纹理识别能力的软机器人手
  • 批准号:
    22H01447
  • 财政年份:
    2022
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
CRCNS: Integrating sensory and prior information to control behavior
CRCNS:整合感觉和先验信息来控制行为
  • 批准号:
    10687117
  • 财政年份:
    2020
  • 资助金额:
    $ 49.72万
  • 项目类别:
CRCNS: Integrating sensory and prior information to control behavior
CRCNS:整合感觉和先验信息来控制行为
  • 批准号:
    10264116
  • 财政年份:
    2020
  • 资助金额:
    $ 49.72万
  • 项目类别:
The role of the striatal GABA and glutamate system for the modulation of the congruency sequence effect by perceptual processes: integrating behavior, psychophysiology, and structure-specific neurochemistry
纹状体 GABA 和谷氨酸系统在感知过程调节一致性序列效应中的作用:整合行为、心理生理学和结构特异性神经化学
  • 批准号:
    422643708
  • 财政年份:
    2019
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Research Grants
Integrating Attribute Decision Heuristics into Travel Choice Models that accommodate Risk Attitude and Perceptual Conditioning
将属性决策启发法集成到适应风险态度和感知条件的旅行选择模型中
  • 批准号:
    DP140100909
  • 财政年份:
    2014
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Discovery Projects
Integrating Perceptual Learning Approaches into Effective Therapies for Low Visio
将感知学习方法融入低视力的有效治疗中
  • 批准号:
    8717669
  • 财政年份:
    2013
  • 资助金额:
    $ 49.72万
  • 项目类别:
Integrating Perceptual Learning Approaches into Effective Therapies for Low Visio
将感知学习方法融入低视力的有效治疗中
  • 批准号:
    8560386
  • 财政年份:
    2013
  • 资助金额:
    $ 49.72万
  • 项目类别:
Integrating Perceptual Learning Approaches into Effective Therapies for Low Visio
将感知学习方法融入低视力的有效治疗中
  • 批准号:
    8889690
  • 财政年份:
    2013
  • 资助金额:
    $ 49.72万
  • 项目类别:
Integrating Perceptual Learning Approaches into Effective Therapies for Low Visio
将感知学习方法融入低视力的有效治疗中
  • 批准号:
    9128021
  • 财政年份:
    2013
  • 资助金额:
    $ 49.72万
  • 项目类别:
CAREER: Integrating Perceptual and Linguistic Information in Models of Semantic Representation
职业:将感知和语言信息整合到语义表示模型中
  • 批准号:
    1056744
  • 财政年份:
    2011
  • 资助金额:
    $ 49.72万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了