权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ITR-Collaborative Research: Development and Evaluation of a Hybrid Concatenative/Rule-Based Visual Speech Synthesis System

ITR 合作研究：混合串联/基于规则的视觉语音合成系统的开发和评估

基本信息

批准号：
0312434
负责人：
Lynne Bernstein
金额：
$ 21.68万
依托单位：
House Ear Institute
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2003
资助国家：
美国
起止时间：
2003-07-15 至 2007-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0312434&HistoricalAwards=false
关键词：
ITR Collaborative Research Development Evaluation

项目摘要

This project's goal is to develop a synthetic talking face. Humans developed sophisticated abilities to perceive and integrate auditory and visual (AV) speech information long before they were required to read printed text presented by computers. Seeing as well as hearing speech reduces the cognitive workload and improves comprehension over only hearing the talker. To realize the advantages of AV speech for human-computer interactions requires synthesizing visual speech, thereby providing an unlimited supply of visual speech images without having to pre-record data. The approach here is to drive optical speech synthesis with speech acoustics. Computational methods obtain models of the transformation from acoustics to optics. The method capitalizes on the speech production coarticulatory information captured by diphones to produce naturalistic visual speech images. The method is applied directly to natural acoustic speech features to obtain coordination between acoustic and optical signals. The synthesized visual speech is based on a texture-mapped wire frame model. A natural speech corpus to base the synthesis is being obtained via simultaneously recorded 3-D optical, audio, and video data. Synthesis development is guided by human perceptual testing. The DVD archived corpus will be disseminated. The project will lead to expanded access to information and improvement in obtaining knowledge by diverse groups of individuals, for example: children still acquiring literacy skills; adults with inadequate literacy; individuals who are using a second language; and individuals with hearing losses who rely on audiovisual speech. Results will be disseminated broadly through professional outlets. Graduate and undergraduate students will participate.

这个项目的目标是开发一个合成的会说话的脸。早在人类被要求阅读计算机呈现的印刷文本之前，人类就已经发展出了感知和整合听觉和视觉（AV）语音信息的复杂能力。与只听说话者说话相比，看和听讲话减少了认知工作量，提高了理解能力。为了实现用于人机交互的AV语音的优点，需要合成视觉语音，从而提供无限制的视觉语音图像供应，而不必预先记录数据。这里的方法是用语音声学来驱动光学语音合成。计算方法获得从声学到光学的转换模型。该方法利用双音子捕获的语音产生共发音信息来产生自然的视觉语音图像。该方法直接应用于自然声学语音特征，以获得声学和光学信号之间的协调。合成的视觉语音是基于纹理映射线帧模型。一个自然的语音语料库的基础合成是通过同时记录的3-D光学，音频和视频数据。合成开发由人类感知测试指导。将分发DVD存档文集。该项目将扩大不同群体获得信息的机会，并改善他们获得知识的情况，例如：仍在学习识字技能的儿童;识字能力不足的成年人;使用第二语言的人;以及依赖视听讲话的听力损失者。结果将通过专业渠道广泛传播。研究生和本科生将参加。