权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

HCC: High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices

HCC：文本转语音的高质量压缩、增强和个性化

基本信息

批准号：
0713617
负责人：
Alexander Kain
金额：
$ 40万
依托单位：
Oregon Health & Science University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2007
资助国家：
美国
起止时间：
2007-09-01 至 2011-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0713617&HistoricalAwards=false
关键词：
HCC Quality Compression Enhancement Personalization

项目摘要

The vast variability of the human speech signal remains a central challenge for Text-to-Speech (TTS) systems. The objective of this research is to develop TTS technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics. The investigators are exploring an asynchronous interpolation model (AIM), which promises to provide for high-quality and flexible TTS. The core idea of AIM is to represent a short region of speech as a composition of several types of features called streams.Each stream is computed by asynchronous interpolation of basis vectors.Each basis vector is associated with a particular phoneme, allophone, or more specialized unit. Thus, the speech region is described by the varying degrees of influence of several types of preceding and following acoustic features. Using AIM, the investigators are also developing methods to optimally compress the acoustic inventories of TTS systems, given a size or a quality constraint, and to adapt the system to a new voice, given a few training samples. The system being researched forms a hybrid between traditional concatenative and formant-based synthesis, having advantages of both, resulting in a high-quality, optimized TTS system with voice adaptation capabilities. TTS has generally recognized societal benefits for universal access, education, and information access by voice. Our research will make it possible, for example, to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds.

人类语音信号的巨大变异性仍然是文本到语音(TTS)系统的核心挑战。这项研究的目标是开发TTS技术，专注于消除拼接错误，并在协同发音、发音程度、韵律效果和说话人特征方面进行准确的语音修改。研究人员正在探索一种异步内插模型(AIM)，该模型有望提供高质量和灵活的TTS。AIM的核心思想是将语音的短区域表示为几种称为流的特征的组合，每个流通过基矢量的异步内插来计算，每个基矢量与特定的音素、音素或更特殊的单元相关联。因此，通过几种类型的前后声学特征的不同程度的影响来描述语音区域。使用AIM，研究人员还在开发方法，在给定大小或质量限制的情况下优化压缩TTS系统的声学库存，并在给定几个训练样本的情况下使系统适应新的声音。正在研究的系统是传统级联合成和基于共振峰的合成的混合体，兼具两者的优势，从而产生具有语音适配能力的高质量、优化的TTS系统。TTS普遍认识到通过语音实现普遍获取、教育和信息获取的社会效益。例如，我们的研究将使我们有可能为那些只能间歇性地发出正常语音的言语障碍患者建立个性化的TTS系统。