权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of innovative speech enhancement algorithms based on the central auditory system.

开发基于中央听觉系统的创新语音增强算法。

基本信息

批准号：
RGPIN-2014-05301
负责人：
Plourde, Eric
金额：
$ 1.6万
依托单位：
Université de Sherbrooke
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2016
资助国家：
加拿大
起止时间：
2016-01-01 至 2017-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=611558
关键词：
Development innovative speech enhancement algorithms

项目摘要

Multimedia devices such as tablets, smart phones and now smart watches and glasses are commonly used in noisy environments by millions of Canadians. These devices include many speech processing algorithms, such as speech coders or automatic speech recognizers (ASR), whose performances are seriously affected by the presence of noise. For example, an ASR can identify 85% of the words correctly in a noise-free environment; however, this percentage can drop to 31% with a signal-to-noise ratio (SNR) of 10 dB. In order to limit this decrease in performance, speech enhancement (SE) modules, which aim at reducing the noise level without affecting the speech quality, are included in these devices. The performance of these SE modules is largely sub-optimal. In fact, a study having compared 14 of the best SE algorithms reports a maximum subjective score of barely 3/5 for an SNR of 10 dB. In sharp contrast, the auditory system deals very well with noise. In fact, it is fairly easy for humans to follow a conversation in a relatively noisy environment. The long-term objective (+ 10 years) of my research program is thus to develop commercially viable SE algorithms that are inspired by the central auditory system, i.e. the part of the auditory system between the auditory nerve and auditory cortex, with the goal of approaching the excellent performance of the auditory system in the presence of noise. In the short-term (less than 5 years), the main objectives will be to statistically model the representation of noisy vocalizations by the neurons of the central auditory system as well as to develop SE algorithms based on these models. To achieve these short-term objectives, we will first represent neural discharges as point processes and use this representation to develop statistical models of neural coding and decoding; neural coding being the estimation of a spike train given a stimulus, such as a vocalization, and neural decoding, the estimation of a stimulus given a spike train. These models will specifically take into account the presence of noise in vocalizations. Furthermore, we will use the derived models to develop statistical estimators for SE. Since these statistical estimators will be set in a domain closer to the one of the central auditory system, we expect the resulting estimators to be more perceptually relevant and thus more efficient. The recent development of accurate, yet simple, statistical models of neural signals opens a promising research avenue for SE that will be exploited in the current proposal. Moreover, this proposal will allow for the training of multidisciplinary researchers having skills in neuroscience, statistical signal processing and speech processing. Upon completion, this program will improve the performance of SE modules and will therefore allow a much more efficient use of millions of multimedia portable devices such as tablets, smart phones, watches or glasses.

多媒体设备，如平板电脑、智能手机、现在的智能手表和眼镜，在嘈杂的环境中被数百万加拿大人普遍使用。这些设备包括许多语音处理算法，如语音编码器或自动语音识别器（ASR），其性能受到噪声的严重影响。例如，在无噪声环境下，ASR可以正确识别85%的单词；然而，当信噪比（SNR）为10 dB时，该百分比可以降至31%。为了限制这种性能下降，这些设备中包括旨在降低噪声水平而不影响语音质量的语音增强（SE）模块。这些SE模块的性能在很大程度上不是最优的。事实上，一项比较了14种最佳SE算法的研究报告称，在信噪比为10 dB的情况下，最大主观评分仅为3/5。与之形成鲜明对比的是，听觉系统能很好地处理噪音。事实上，在相对嘈杂的环境中，人类很容易跟上对话。