权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Model Based Approach Towards Practical Blind Enhancement of Audio Signals Acquired in Real Acoustic Environments

一种基于模型的方法，用于对真实声学环境中获取的音频信号进行实用盲增强

基本信息

批准号：
EP/D051207/1
负责人：
James Hopgood
金额：
$ 15.82万
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2006
资助国家：
英国
起止时间：
2006 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FD051207%2F1
关键词：
Model Based Approach Towards Practical

项目摘要

This proposal concerns enhancing the quality and intelligibility of audio.The ubiquitousness of digital audio in broadcasting, storage, and multimedia applications, each offering crystal clear sound quality, has resulted in a heightened awareness and expectation of the achievable performance of applications involving audio signals: digital hearing aids should outperform their analogue counterparts in concert halls, speech recognition software should achieve high recognition rates in office environments, and hands-free telephones must produce intelligible speech when used in car cabins.The quality and intelligibility of speech obtained in these scenarios is constrained not just by the reproduction quality of the hardware itself; rather, it is dependent on the acoustical properties of the environment in which the audio is acquired.Specifically, audio signals in confined acoustic environments exhibit reverberation; this causes problems in two major classes of signal processing applications. The first is in automatic speech recognition in which it is more difficult to identify reverberant speech than closely coupled speech. This prevents hands-free interaction without the undesirable constraint that a user must carry a microphone close to their mouth. The second class involves the desire to improve speech quality and intelligibility from devices such as mobile and hands-free telephones and next generation digital hearing aids. In each scenario, the presence of reverberation should be reduced to adequate levels by a robust speech enhancement algorithm that can be applied in any acoustic environment, and which does not rely on the acoustic properties being known a priori. Since neither the acoustic impulse response (AIR) nor the source audio is known in this situation, the process of removing the effects of reverberation is known as blind dereverberation.Previously, blind dereverberation has often been approached assuming the AIR between the source and sensor is time-invariant. This might be appropriate in scenarios where the source-sensor geometry is not rapidly varying, for example, a hands-free kit in a car cabin, in which the driver and the microphone are approximately fixed relative to one another, or in a work environment where a user is seated in front of a computer terminal in roughly the same configuration. However, there are many applications where the source-sensor geometry is subject to change; the wearer of a hearing-aid will typically wish to move around a room, as might users of hands-free conference telephony equipment.Moreover, it is not beyond possibility that the acoustics of the room itself vary; the changing state of doors, windows, or items being moved in the room will influence the room dynamics, as will a moving person. Consequently, in order to develop a blind dereverberation algorithm that is suitable for practical applications, it is important to account for source-sensor movement, and for possible changes in the acoustical properties of the room.This proposal uses model based signal processing, robust Bayesian statistical parameter estimation and numerical optimisation methods, in order to obtain practical algorithms to tackle this problem. Model-based signal processing is fundamentally based on the availability of realistic, tractable, and extensiblemodels that reflect the underlying processes and systems involved. This proposal focuses on developing, implementing, testing, and applying a number of models that have not previously been investigated in blind speech dereverberation. These include- a complete speech model that accounts for both voiced and unvoiced speech;- a more realistic room acoustic model;- subband methods for dealing with the complicated acoustic responses that occur in realistic acoustic environments;- models that can account for varying source-sensor geometries;- models that can be estimated using batch and sequential Monte Carlo methods.

这项建议是关于提高音频的质量和清晰度。数字音频在广播、存储和多媒体应用中无处不在，每个应用都提供清晰的音质，这导致人们对涉及音频信号的应用的可实现性能的认识和期望提高：数字助听器在音乐霍尔斯中的性能应该优于其模拟助听器，语音识别软件应该在办公环境中实现高识别率，手，当在车厢中使用时，免费电话必须产生清晰的语音。在这些场景中获得的语音质量和清晰度不仅受到硬件再现质量的限制本身;确切地说，在受限的声学环境中的音频信号表现出混响;这在两个主要类别的信号处理应用中引起问题。第一个是在自动语音识别中，识别混响语音比识别紧密耦合语音更困难。这防止了免提交互，而没有用户必须携带麦克风靠近他们的嘴的不期望的约束。第二类涉及改善诸如移动的和免提电话以及下一代数字助听器之类的设备的语音质量和可懂度的期望。在每种情况下，混响的存在应该通过鲁棒的语音增强算法减少到足够的水平，该算法可以应用于任何声学环境中，并且不依赖于先验已知的声学特性。由于在这种情况下既不知道声学脉冲响应（AIR）也不知道源音频，因此消除混响影响的过程被称为盲去混响。以前，盲去混响通常假设源和传感器之间的AIR是时不变的。这可能适用于源-传感器几何形状不快速变化的场景，例如，驾驶员和麦克风相对于彼此大致固定的汽车驾驶室中的免提套件，或者用户坐在大致相同配置的计算机终端前面的工作环境中。然而，在许多应用中，声源-传感器的几何形状是可以改变的;助听器的佩戴者通常希望在房间里走动，就像免提电话会议设备的使用者一样。门、窗或在房间中移动的物品的变化状态将影响房间动态，移动的人也将影响房间动态。因此，为了开发一个盲去混响算法，是适合于实际应用中，它是重要的是要考虑到源传感器的运动，并为可能的变化，在声学性质的房间。该建议使用基于模型的信号处理，鲁棒贝叶斯统计参数估计和数值优化方法，以获得实用的算法来解决这个问题。基于模型的信号处理从根本上说是基于现实的，易处理的，可扩展的模型，反映了底层的过程和系统。该建议的重点是开发，实施，测试和应用一些模型，以前没有被调查的盲语音去混响。这些包括-一个完整的语音模型，占有声和无声的语音;-一个更现实的房间声学模型;-子带的方法处理复杂的声学响应，发生在现实的声学环境中;-模型，可以考虑不同的源传感器的几何形状;-模型，可以估计使用批处理和顺序蒙特卡罗方法。