权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Hybrid Speech Synthesis for Voice Output Communication Aids

用于语音输出通信辅助的混合语音合成

基本信息

批准号：
7156322
负责人：
SUSAN R HERTZ
金额：
$ 37.35万
依托单位：
SYNFONICA, LLC
依托单位国家：
美国
项目类别：
财政年份：
2004
资助国家：
美国
起止时间：
2004-04-01 至 2008-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/7156322
关键词：
clinical research communication speech voice

项目摘要

DESCRIPTION (provided by applicant): NovaSpeech proposes to develop an innovative perceptually-oriented hybrid approach to unconstrained speech synthesis for generating individualized, customized voices of either gender and any age. The system will provide human-sounding, intelligible, and mimetic speech, yet have small storage requirements, be able to support the cost-efficient addition of new voices, and be suitable for implementation on virtually any hardware platform. As a result, the technology will be well-suited to virtually any unlimited vocabulary synthesis application, but be of special benefit to speech-impaired individuals, who have a particularly great need for natural-sounding, individualized voices on a broad range of devices. With the hybrid system, individuals who know they will lose their voice due to illness or surgery will be able to cost-efficiently capture and utilize their pre-injury voice in a voice output communication aid; and all speech-impaired users will be able to obtain reliable, appropriate, individualized voices that can grow with them as they mature and age. No existing synthesis approach meets these needs, with each type of technology trading off one desirable property for another, be it low storage requirements for natural voice quality, or human voice quality for flexibility. The hybrid approach overcomes these limitations by integrating, in a novel and principled way, the best features of two well-known synthesis techniques: corpus-based waveform concatenation and rule-based formant synthesis. Capitalizing on a number of important perceptual principles, the system will prestore only a small number of intrinsic units, such as stressed vowels, from the target speaker, and synthesize other, adaptable units by rule. Thus with only a small prestored speech corpus, and a common set of rules across voices, it will produce speech that sounds like the intended speaker. In its proposed Phase II project, NovaSpeech will develop a complete hybrid prototype text-to-speech (TTS) system for eight voices in General American English, including male and female children, adults, and elderly adults (the base speakers), as well as for two speakers who know they will lose their ability to speak naturally as a result of future laryngectomies. Year 1 will be focused on exploring possible system architectures; implementing rules for adaptable units; and exploring through perceptual experiments possible strategies for storing and selecting intrinsic units. Year 2 will be focused on implementing a fully functional hybrid TTS prototype for the six base voices. By month six of year 2 at the latest, the company will verify the ability to quickly add new voices by implementing the voices of the laryngectomy patients, providing them with functional systems for their voices, and obtaining feedback from them and those who know them about the quality of the voices and system features. The ultimate objective of the hybrid project is to improve the naturalness and mimetic quality of speech synthesized from unrestricted symbolic input, with the particular goal of enhancing the utility and flexibility of voice output communication aids for speech-impaired individuals.

描述（由申请人提供）：NovaSpeech提出开发一种创新的面向感知的混合方法来进行不受约束的语音合成，以生成个性化的、定制的任何性别和任何年龄的声音。该系统将提供听起来像人的、可理解的和模仿的语音，但具有小的存储要求，能够支持具有成本效益的新语音的添加，并且适合在几乎任何硬件平台上实现。因此，该技术将非常适合几乎任何无限制的词汇合成应用，但对有语言障碍的人特别有益，他们特别需要在各种设备上使用自然的、个性化的声音。有了混合系统，那些知道自己会因为疾病或手术而失去声音的人将能够以具有成本效益的方式捕获并利用他们受伤前的声音输出通信辅助设备;所有有语言障碍的用户都将能够获得可靠，适当，个性化的声音，这些声音可以随着他们的成熟和年龄而成长。现有的合成方法不能满足这些需求，每种类型的技术都在一个理想的特性与另一个理想的特性之间进行权衡，无论是对自然语音质量的低存储要求，还是对灵活性的人类语音质量。混合方法克服了这些限制，通过集成，在一个新的和原则的方式，两个著名的合成技术的最佳功能：基于语料库的波形拼接和基于规则的共振峰合成。利用一些重要的感知原则，系统将只预存少量的内在单位，如重读元音，从目标说话者，并合成其他的，可适应的单位的规则。因此，只有一个小的预存储的语音语料库，和一套共同的规则，在声音，它将产生语音听起来像预期的发言者。在其拟议的第二阶段项目中，NovaSpeech将开发一个完整的混合原型文本到语音（TTS）系统，用于普通美国英语中的八种声音，包括男性和女性儿童，成人和老年人（基础扬声器），以及两个扬声器，他们知道他们将失去自然说话的能力，因为未来的喉切除术。第一年将专注于探索可能的系统架构;实施适应性单元的规则;并通过感知实验探索存储和选择内在单元的可能策略。第二年的重点是为六个基本语音实现一个功能齐全的混合TTS原型。最迟在第二年的第六个月，该公司将通过实施喉切除术患者的语音，为他们提供语音功能系统，并从他们和了解他们的人那里获得关于语音质量和系统功能的反馈，来验证快速添加新语音的能力。该混合项目的最终目标是提高从不受限制的符号输入合成的语音的自然度和模仿质量，特别是提高语音输出通信辅助设备对语言障碍者的实用性和灵活性。