权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interference in spoken communication: Evaluating the corrupting and disrupting effects of other voices

言语交流中的干扰：评估其他声音的破坏和破坏效果

基本信息

批准号：
ES/N014383/1
负责人：
Brian Roberts
金额：
$ 42.04万
依托单位：
Aston University
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2016
资助国家：
英国
起止时间：
2016 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=ES%2FN014383%2F1
关键词：
Interference spoken communication Evaluating corrupting

项目摘要

In everyday life, talking with other people is important not only for sharing knowledge and ideas, but also for maintaining a sense of belonging to a community. Most people take it for granted that they can converse with others with little or no effort. Successful communication involves understanding what is being said and being understood, but it is quite rare to hear the speech of a particular talker in isolation. Speech is typically heard in the presence of interfering sounds, which are often the voices of other talkers. The human auditory system, which is responsible for our sense of hearing, therefore faces the challenge of identifying which parts of the sounds reaching our ears have come from which talker.Solving this "auditory scene analysis" problem involves separating those sound elements arising from one source (e.g., the voice of the talker to whom you are attending) from those arising from other sources, so that the identity and meaning of the target source can be interpreted by higher-level processes in the brain. Over the course of evolution, humans have been exposed to a variety of complex listening environments, and so we are generally very successful at understanding the speech of one person in the presence of other talkers. This contrasts with attempts to develop listening machines, which often fail when confronted with adverse conditions, such as automatic transcription of a conversation in an open-plan office. Human listeners with hearing impairment often find these environments especially difficult, even when using the latest developments in hearing-aid or cochlear-implant design, and so can struggle to communicate effectively in such conditions.Much of the information necessary to understand speech (acoustic-phonetic information) is carried by the changes in frequency over time of a few broad peaks in the frequency spectrum of the speech signal, known as formants. The project aims to investigate how listeners presented with mixtures of target speech and interfering formants are able to group together the appropriate formants, and to reject others, such that the speech of the talker we want to listen to can be understood. Interfering sounds can have two kinds of effect - energetic masking, in which the neural response of the ear to the target is swamped by the response to the masker, and informational masking, in which the "auditory brain" fails to separate readily detectable parts of the target from the masker. The project will explore the informational masking component of interference - often the primary factor limiting speech intelligibility - using stimulus configurations that eliminate energetic masking. We will do so using perceptual experiments in which we measure how our ability to understand speech (e.g., the number of words reported correctly) changes under a variety of conditions.The project will examine how acoustic-phonetic information is combined across formants. It will also explore how a speech-like interferer affects intelligibility, distinguishing the circumstances in which the interferer takes up some of the available perceptual processing capacity from those in which specific properties of the interferer intrude into the perception of the target speech. Our approach is to use artificial speech-like stimuli with precisely controlled properties, to mix target speech with carefully designed interferers that offer alternative grouping possibilities, and to measure how manipulating the properties of these interferers affects listeners' abilities to recognise the target speech in the mixture. The results will improve our understanding of how human listeners separate speech from interfering sounds and the constraints on that separation, helping to refine computational models of listening. Such refinements will in turn provide ways of improving the performance of devices such as hearing aids and automatic speech recognisers when they operate in adverse listening conditions.

在日常生活中，与他人交谈不仅对分享知识和想法很重要，而且对保持社区归属感也很重要。大多数人想当然地认为他们可以毫不费力地与他人匡威。成功的沟通包括理解正在说的话并被理解，但很少单独听到某个说话者的讲话。语音通常是在存在干扰声音的情况下听到的，这些干扰声音通常是其他说话者的声音。因此，负责我们听觉的人类听觉系统面临着识别到达我们耳朵的声音的哪些部分来自哪个说话者的挑战。解决这个“听觉场景分析”问题涉及分离来自一个源的那些声音元素（例如，你所关注的谈话者的声音）与其他来源的声音区分开来，这样目标来源的身份和意义就可以被大脑中更高层次的过程解释。在进化的过程中，人类已经暴露在各种复杂的听觉环境中，因此我们通常非常成功地理解一个人在其他说话者在场的情况下的讲话。这与开发听力机器的尝试形成了鲜明对比，后者在遇到不利条件时经常失败，例如在开放式办公室中自动转录对话。有听力障碍的人类听众通常会发现这些环境特别困难，即使使用助听器或耳蜗植入设计的最新发展，因此在这种条件下也很难有效地进行交流。（声学语音信息）由语音信号频谱中的几个宽峰随时间的频率变化携带，称为共振峰。该项目旨在研究听众如何面对目标语音和干扰共振峰的混合物，能够将适当的共振峰组合在一起，并拒绝其他共振峰，这样我们想要听的说话者的语音就可以被理解。干扰声音可以产生两种效果：能量掩蔽（energetic masking），即耳朵对目标的神经反应被掩蔽者的反应淹没;信息掩蔽（information masking），即“听觉大脑”无法将目标中容易检测到的部分与掩蔽者区分开来。该项目将探讨干扰的信息掩蔽成分-通常是限制语音清晰度的主要因素-使用消除能量掩蔽的刺激配置。我们将使用感知实验来做到这一点，在这些实验中，我们测量我们理解语音的能力（例如，正确报告的单词数量）在各种条件下发生变化。该项目将研究声学语音信息如何在共振峰之间组合。它还将探讨如何一个语音样的发音影响可理解性，区分的情况下，其中的发音占用了一些可用的感知处理能力的发音的特定属性侵入到目标语音的感知。我们的方法是使用人工语音样的刺激与精确控制的属性，混合目标语音与精心设计，提供替代分组的可能性，并测量如何操纵这些objecrers的属性影响听众的能力，以识别目标语音的混合物。这些结果将提高我们对人类听众如何将语音与干扰声音分离以及这种分离的限制的理解，有助于完善听力的计算模型。这种改进反过来将提供改善助听器和自动语音识别器等设备在不利听力条件下工作时的性能的方法。