权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Articulatory Speech Synthesis for Natural User Interfaces

自然用户界面的发音合成

基本信息

批准号：
463376-2014
负责人：
Penn, Gerald
金额：
$ 14.64万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Strategic Projects - Group
财政年份：
2015
资助国家：
加拿大
起止时间：
2015-01-01 至 2016-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=577054
关键词：
Articulatory Speech Synthesis Natural User

项目摘要

For years, articulatory synthesis research has been largely overshadowed by formant-based and acoustic-based speech synthesis techniques. While successful in some domains (e.g., voice-based databases), these techniques still cannot produce natural looking and sounding speech from text from an arbitrary speaker. Natural looking and sounding speech technology is one of the next major milestones in voice-based interaction for natural user interfaces. Articulatory speech synthesis has progressed steadily at the fringes of both industrial and academic interest and is now poised to provide the necessary platform to overcome basic problems in speech production and, we believe, represents the next major advance in speech synthesis technology. Because of the structural complexity of the human vocal tract and of speech production behaviour, prior research in 3-dimensional articulatory synthesis has been focused on analyzing and modeling narrowly defined aspects of speech production and vocal tract structure. Rather than modeling a few sub-components of the overall vocal tract for production of a limited set of unnatural utterances, a more complete platform is needed that will allow vocal tract sub-components to be integrated and tested within the context of a working articulatory speech synthesizer that utilizes the best available technologies for the entire vocal tract. For decades, the Haskins 2D Articulatory Speech Synthesizer has been commonly used, even with the well-known limitations of shapes and sounds it can produce, and the lack of accurate representations of either generic or speaker-specific production parameters. Advances, such as VTL by Birkholtz, have made progress in 3D articulatory speech synthesis, but remain visually undeveloped as well as lacking biomechanical foundations. To overcome these limitations and provide a platform for new research in articulatory speech synthesis, we propose to construct and evaluate an aerodynamically driven articulatory speech synthesizer based on a comprehensive, parameterized 3D biomechanical model of the vocal and facial articulators, that is capable of producing both visible and acoustic speech and non-speech.

多年来，发音合成研究在很大程度上被基于共振峰和基于声学的语音合成技术所掩盖。虽然在某些领域取得了成功（例如，基于语音的数据库），但是这些技术仍然不能从来自任意说话者的文本中产生看起来和听起来自然的语音。外观和声音自然的语音技术是自然用户界面中基于语音的交互的下一个主要里程碑之一。发音语音合成在工业界和学术界都取得了稳步的进展，现在正准备提供必要的平台来克服语音生产中的基本问题，我们相信，它代表了语音合成技术的下一个重大进展。由于人类声道和语音产生行为的结构复杂性，在3维发音合成中的先前研究一直集中于分析和建模语音产生和声道结构的狭义方面。不是对整个声道的几个子成分进行建模以产生有限的一组不自然的话语，而是需要一个更完整的平台，该平台将允许声道子成分在工作发音语音合成器的上下文中被集成和测试，该合成器利用用于整个声道的最佳可用技术。几十年来，Haskins 2D发音语音合成器一直被广泛使用，即使它可以产生的形状和声音的限制是众所周知的，并且缺乏通用或特定于扬声器的生产参数的准确表示。像Birkholtz这样的进步在3D发音语音合成方面取得了进展，但仍然缺乏视觉上的发展以及缺乏生物力学基础。为了克服这些局限性，并提供一个新的研究平台，在发音语音合成，我们建议构建和评估一个空气动力学驱动的发音语音合成器的基础上，一个全面的，参数化的三维生物力学模型的声乐和面部发音，这是能够产生可见和声学语音和非语音。