权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

KDI: Segmental and Prosodic Optical Phonetics for Human and Machine Speech Processing

KDI：用于人类和机器语音处理的分段和韵律光学语音学

基本信息

批准号：
9872849
负责人：
Lynne Bernstein
金额：
$ 141万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
1998
资助国家：
美国
起止时间：
1998-10-01 至 1998-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9872849&HistoricalAwards=false
关键词：
KDI Segmental Prosodic Optical Phonetics

项目摘要

This is a three-year standard award. When people talk, they produce both acoustic and optical speech signals. When people with normal hearing and vision communicate face to face, they make use of both types of signals. Being able to see and hear a talker can enhance speech understanding in conditions of noisy or difficult-to-comprehend speech. The goals of this multidisciplinary project are to quantitatively characterize optical speech signals, examine how optical speech characteristics relate to acoustic and to physiologic speech characteristics, study several fundamental issues in human visual speech perception, and apply obtained knowledge to optical speech synthesis. Across the entire project, the main questions we address are: (1) What speech information can perceivers get from seeing talkers? (2) How are optical and acoustic signals related to underlying speech articulations? (3) What are the perceptual and neurophysiological bases for visual speech perception? and (4) Can we demonstrate usefulness of this knowledge for developing synthesis of artificial talking faces? A multi-talker database is being recorded for this project. Recordings include acoustic, optical (with retroreflector labeling on the faces), and physiologic signals (Electromagnetic Midsaggital Articulography -- EMA). Studies follow up on recent results in the literature showing high correlations between acoustic and optical speech measures, and between external (optical) and internal (physiologic) speech measures. Studies include perceptual experiments to determine segmental and prosodic speech characteristics. We are investigating the neurophysiologic bases for visual speech perception in deaf and hearing adults using electrophysiologic measures. Optical speech synthesis is being employed to (1) test our understanding of the cues that control visual perception of phonemes and prosody, and (2) investigate the neurophysiological bases for human sensitivity to optical speech characteristics. The project will impact engineering in the areas of speech synthesis and audiovisual automatic speech recognition. It will extend understanding of human speech perception and its neurophysiologic bases in deaf and hearing individuals. The applications that will derive from the project include ones in second language training, enhancement of speech transmission quality and recognition accuracy under conditions of environmental noise, efficient storage and transmission of optical speech information, stimulus control in audiovisual perceptual experiments, and communication enhancement for hearing impaired people. The multidisciplinary team of principal investigators represent the fields of cognitive science, speech perception, linguistics, electrical engineering, and neurophysiology.

这是一个为期三年的标准奖。当人们说话时，他们产生声学和光学语音信号。当听力和视力正常的人面对面交流时，他们会同时使用这两种信号。能够看到和听到说话者可以在嘈杂或难以理解的语音条件下增强语音理解。这个多学科项目的目标是定量表征光学语音信号，研究光学语音特性与声学和生理语音特性的关系，研究人类视觉语音感知的几个基本问题，并将获得的知识应用于光学语音合成。在整个项目中，我们要解决的主要问题是：（1）感知者能从说话者那里得到什么样的言语信息？(2)光学和声学信号是如何与潜在的语音发音相关的？(3)视觉言语知觉的知觉和神经生理学基础是什么？和（4）我们可以证明有用的知识，为发展人工说话的脸合成？正在为该项目录制一个多通话者数据库。记录包括声学、光学（面部带有后向反射器标签）和生理信号（电磁正中矢状关节造影- EMA）。研究跟进最近的结果，在文献中显示高相关性之间的声学和光学语音措施，外部（光学）和内部（生理）语音措施。研究包括知觉实验，以确定分段和韵律语音特征。我们正在调查的神经生理学基础的视觉言语知觉聋人和听力成人使用电生理措施。光学语音合成被用于（1）测试我们对控制音素和韵律的视觉感知的线索的理解，以及（2）调查人类对光学语音特征敏感性的神经生理学基础。该项目将影响语音合成和视听自动语音识别领域的工程。它将扩大理解人类的言语知觉和它的神经生理学基础在聋人和听力个体。该项目的应用包括第二语言训练、提高环境噪声条件下的语音传输质量和识别准确性、有效存储和传输光学语音信息、视听感知实验中的刺激控制以及听力障碍者的沟通增强。多学科团队的主要研究人员代表认知科学，言语感知，语言学，电气工程和神经生理学领域。