权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of innovative speech enhancement algorithms based on the central auditory system.

开发基于中央听觉系统的创新语音增强算法。

基本信息

批准号：
RGPIN-2014-05301
负责人：
Plourde, Eric
金额：
$ 1.6万
依托单位：
Université de Sherbrooke
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=678653
关键词：
Development innovative speech enhancement algorithms

项目摘要

Multimedia devices such as tablets, smart phones and now smart watches and glasses are commonly used in noisy environments by millions of Canadians. These devices include many speech processing algorithms, such as speech coders or automatic speech recognizers (ASR), whose performances are seriously affected by the presence of noise. For example, an ASR can identify 85% of the words correctly in a noise-free environment; however, this percentage can drop to 31% with a signal-to-noise ratio (SNR) of 10 dB. In order to limit this decrease in performance, speech enhancement (SE) modules, which aim at reducing the noise level without affecting the speech quality, are included in these devices. The performance of these SE modules is largely sub-optimal. In fact, a study having compared 14 of the best SE algorithms reports a maximum subjective score of barely 3/5 for an SNR of 10 dB. In sharp contrast, the auditory system deals very well with noise. In fact, it is fairly easy for humans to follow a conversation in a relatively noisy environment. **The long-term objective (+ 10 years) of my research program is thus to develop commercially viable SE algorithms that are inspired by the central auditory system, i.e. the part of the auditory system between the auditory nerve and auditory cortex, with the goal of approaching the excellent performance of the auditory system in the presence of noise. In the short-term (less than 5 years), the main objectives will be to statistically model the representation of noisy vocalizations by the neurons of the central auditory system as well as to develop SE algorithms based on these models. To achieve these short-term objectives, we will first represent neural discharges as point processes and use this representation to develop statistical models of neural coding and decoding; neural coding being the estimation of a spike train given a stimulus, such as a vocalization, and neural decoding, the estimation of a stimulus given a spike train. These models will specifically take into account the presence of noise in vocalizations. Furthermore, we will use the derived models to develop statistical estimators for SE. Since these statistical estimators will be set in a domain closer to the one of the central auditory system, we expect the resulting estimators to be more perceptually relevant and thus more efficient. **The recent development of accurate, yet simple, statistical models of neural signals opens a promising research avenue for SE that will be exploited in the current proposal. Moreover, this proposal will allow for the training of multidisciplinary researchers having skills in neuroscience, statistical signal processing and speech processing. Upon completion, this program will improve the performance of SE modules and will therefore allow a much more efficient use of millions of multimedia portable devices such as tablets, smart phones, watches or glasses.

平板电脑、智能手机以及现在的智能手表和眼镜等多媒体设备是数百万加拿大人在嘈杂环境中常用的设备。这些设备包括许多语音处理算法，例如语音编码器或自动语音识别器（ASR），其性能受到噪声存在的严重影响。例如，ASR可以在无噪声环境中正确识别85%的单词;然而，在信噪比（SNR）为10 dB的情况下，这一百分比可以下降到31%。为了限制这种性能下降，这些设备中包括语音增强（SE）模块，其目的是在不影响语音质量的情况下降低噪声水平。这些SE模块的性能在很大程度上是次优的。事实上，一项比较了14种最佳SE算法的研究报告，SNR为10 dB时，最大主观评分仅为3/5。与此形成鲜明对比的是，听觉系统能够很好地处理噪音。事实上，人类在相对嘈杂的环境中很容易听懂对话。** 因此，我的研究计划的长期目标（+ 10年）是开发商业上可行的SE算法，这些算法受到中央听觉系统的启发，即听觉神经和听觉皮层之间的听觉系统部分，目标是在存在噪声的情况下接近听觉系统的出色性能。在短期内（不到5年），主要目标将是统计模型的代表性嘈杂的发声中枢听觉系统的神经元，以及开发SE算法的基础上，这些模型。为了实现这些短期目标，我们首先将神经放电表示为点过程，并使用这种表示来开发神经编码和解码的统计模型;神经编码是对给定刺激（如发声）的尖峰序列的估计，而神经解码是对给定尖峰序列的刺激的估计。这些模型将特别考虑到发声中存在的噪声。此外，我们将使用衍生模型来开发统计估计SE。由于这些统计估计量将被设置在更接近中央听觉系统的域中，因此我们期望得到的估计量在感知上更相关，从而更有效。 ** 最近发展的准确，但简单，神经信号的统计模型为SE开辟了一条有前途的研究途径，将在当前的提案中加以利用。此外，该提案将允许培训具有神经科学，统计信号处理和语音处理技能的多学科研究人员。完成后，该计划将提高SE模块的性能，因此将允许更有效地使用数百万多媒体便携式设备，如平板电脑，智能手机，手表或眼镜。