权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

General Auditory Model of Adaptive Perception of Speech

自适应言语感知的一般听觉模型

基本信息

批准号：
8211404
负责人：
Antonia David Vitela
金额：
$ 4.22万
依托单位：
UNIVERSITY OF ARIZONA
依托单位国家：
美国
项目类别：
财政年份：
2011
资助国家：
美国
起止时间：
2011-01-01 至 2012-12-31
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): One of the fundamental challenges for communication by speech is the variability in speech production/acoustics. Talkers vary in the size and shape of their vocal tract, in dialect, and in speaking mannerisms. These differences all impact the acoustic output. Despite this lack of invariance in the acoustic signal, listeners can correctly perceive the speech of many different talkers. This ability to adapt one's perception to the particular acoustic structure of a talker has been investigated for over fifty years. The prevailing explanation for this phenomenon is that listeners construct talk-specific representations that can serve as referents for subsequent speech sounds. Specifically, it is thought that listeners may either be creating mappings between acoustics and phonemes or extracting the vocal tract anatomy and shape for each individual talker. The proposed research focuses on an alternative explanation, which takes a more general auditory approach. Data from previous studies has indicated that listeners may be calculating an average spectral representation (long term average spectrum - LTAS) of a talker's speech and using that as a referent. This process/representation is not speech-specific but can still accommodate some of the talker-specific variability. In previous work, I have developed a model of perceptual adaptation that relies on the computation of the LTAS. The goal of this project is to further develop and test this model by determining a more accurate estimate of the effective representation of the LTAS and comparing its predictions to those of perceptual learning approaches. In order to accomplish these goals, the time window over which the LTAS is computed by listeners must be determined (Aim #1). The project includes a series of experiments in which preceding context is added to a target sound to determine the effect on categorization of the target. By increasing the duration of the context (and having each duration change the LTAS), I can determine how much of the context is effective in eliciting a perceptual effect. One of the innovations of these studies is that the speech is synthesized using a realistic vocal tract model, allowing acoustic control constrained by realistic articulations. It also allows me to create different "talkers" with knowable anatomical/articulatory differences. This model will be tested against a traditional approach in predicting the effect on listeners of being exposed to novel "dialects" or "accents". Vowel productions will be shifted to produce learnable differences in vowel categorization for listeners, but these shifts will have independent effects on the LTAS of the talker. In this way, I will be able to test which model best explains the perceptual data. The development of such a model will delimit the ability of listeners to accommodate variations due to anatomical differences, accent, dialect and even motor speech disorders. It also provides an indication of what information is important in the signal for adaptive complex sound perception that may be distorted by signal processing in hearing aids and cochlear implants. PUBLIC HEALTH RELEVANCE: This research will provide insight into the processes/representations involved in the ability of listeners to accommodate the variability in speech arising from differences in talker characteristics including anatomy, speaking style, accent and motor disability. Current hearing aid and cochlear implant systems can disrupt some of this information that could be critical for robust speech perception. The results could impact the future development of these hearing devices, as well as strategies for improving intelligibility.

描述（由申请人提供）：语音交流的基本挑战之一是语音产生/声学的可变性。说话者在声道的大小和形状、方言和说话习惯上各不相同。这些差异都会影响声学输出。尽管声音信号缺乏不变性，但听者可以正确地感知许多不同说话者的讲话。这种使人的感知适应说话人的特定声学结构的能力已经被研究了50多年。对这一现象的普遍解释是，听者构建了特定于谈话的表征，这些表征可以作为后续语音的参考。具体来说，人们认为听者可能会在声学和音素之间建立映射，或者提取每个说话人的声道解剖和形状。拟议的研究侧重于另一种解释，即采用更普遍的听觉方法。先前研究的数据表明，听者可能会计算说话者讲话的平均频谱表示（长期平均频谱- LTAS），并将其作为参考。这个过程/表示不是特定于言语的，但仍然可以适应一些特定于说话者的可变性。在之前的工作中，我开发了一个依赖于LTAS计算的感知适应模型。该项目的目标是通过确定LTAS的有效表示的更准确估计并将其预测与感知学习方法的预测进行比较，进一步开发和测试该模型。为了实现这些目标，必须确定侦听器计算LTAS的时间窗口（目标1）。该项目包括一系列实验，在这些实验中，将前面的上下文添加到目标声音中，以确定对目标分类的影响。通过增加上下文的持续时间（并让每个持续时间改变LTAS），我可以确定上下文在多大程度上有效地引发感知效应。这些研究的一个创新之处在于语音是用真实的声道模型合成的，允许受真实发音约束的声学控制。它也允许我创造不同的“说话者”与可知的解剖/发音差异。在预测听者接触到新的“方言”或“口音”的影响时，这个模型将与传统方法进行对比测试。元音产生会发生变化，从而使听者在元音分类上产生可学习的差异，但这些变化对说话者的LTAS有独立的影响。通过这种方式，我将能够测试哪个模型最能解释感知数据。这种模型的发展将界定听者适应因解剖差异、口音、方言甚至运动语言障碍而产生的变化的能力。它还提供了信号中哪些信息对自适应复杂声音感知是重要的，这些信号可能被助听器和人工耳蜗的信号处理所扭曲。