权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Deep Neural Networks for Robust Speech Recognition through Integrated Acoustic Modeling and Separation

RI：中：通过集成声学建模和分离实现鲁棒语音识别的深度神经网络

基本信息

批准号：
1409431
负责人：
Eric Fosler-Lussier
金额：
$ 79.81万
依托单位：
Ohio State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-06-01 至 2019-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1409431&HistoricalAwards=false
关键词：
RI Medium Deep Neural Networks

项目摘要

Over the last decade, speech recognition technology has become steadily more present in everyday life, as seen by the proliferation of applications including mobile personal agents and transcription of voicemail messages. Performance of these systems, however, degrades significantly in the presence of background noise; for example, using speech recognition technology in a noisy restaurant or on a windy street can be difficult because speech recognizers confuse the background noise with linguistic content. Compensation for noise typically involves preprocessing the acoustic signal to emphasize the speech signal (i.e. speech separation), and then feeding this processed input into the recognizer. The innovative approach in this project is to train the recognition and separation systems in an integrated manner so that the linguistic content of the signal can inform the separation, and vice versa. Given the impact of the recent resurgence of Deep Neural Networks (DNNs) in speech processing, this project seeks to make DNNs more resistant to noise by integrating speech separation and speech recognition, exploring three related areas. The first research area seeks to stabilize input to DNNs by combining DNN-based suppression and acoustic modeling, integrating masking estimates across time and frequency, and using this information to improve reconstruction of speech from noisy input. The second area seeks to examine a richer DNN structure, using multi-task learning techniques to guide the construction of DNNs better at performing all tasks and where layers have meaningful structure. The final research area examines ways to adapt the spurious output of DNN acoustic models given acoustic noise. With the focus of integrating speech separation and recognition, the project will be evaluated both by measuring speech recognition performance, as well as metrics that are more closely related to human speech perception. This will ensure a broader impact of this research by providing insights not only to speech technology but also facilitating the design of next-generation hearing technology in the long run.

在过去的十年中，语音识别技术在日常生活中变得越来越稳定，如包括移动的个人代理和语音邮件消息的转录的应用的激增所示。然而，这些系统的性能在存在背景噪声的情况下会显著降低;例如，在嘈杂的餐馆或多风的街道上使用语音识别技术可能很困难，因为语音识别器将背景噪声与语言内容混淆。对噪声的补偿通常涉及对声学信号进行预处理以强调语音信号（即语音分离），然后将该处理后的输入馈送到识别器中。该项目的创新方法是以集成的方式训练识别和分离系统，以便信号的语言内容可以通知分离，反之亦然。鉴于最近深度神经网络（DNN）在语音处理中的复苏，该项目旨在通过整合语音分离和语音识别，探索三个相关领域，使DNN更能抵抗噪声。第一个研究领域旨在通过结合基于DNN的抑制和声学建模来稳定DNN的输入，整合时间和频率上的掩蔽估计，并使用这些信息来改善噪声输入的语音重建。第二个领域旨在研究更丰富的DNN结构，使用多任务学习技术来指导DNN的构建，以便更好地执行所有任务，并且层具有有意义的结构。最后一个研究领域研究了在给定声学噪声的情况下适应DNN声学模型的伪输出的方法。该项目的重点是集成语音分离和识别，将通过测量语音识别性能以及与人类语音感知更密切相关的指标进行评估。这将确保这项研究产生更广泛的影响，不仅为语音技术提供见解，而且从长远来看，还将促进下一代听力技术的设计。