权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

User Adaptation of AAC Device Voices

AAC设备语音的用户适配

基本信息

批准号：
7219057
负责人：
Jan van Santen
金额：
$ 15.01万
依托单位：
BIOSPEECH, INC.
依托单位国家：
美国
项目类别：
财政年份：
2007
资助国家：
美国
起止时间：
2007-01-01 至 2008-06-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/7219057
关键词：
Acoustics Adult Age Algorithms American Augmentative and Alternative Communication device Autistic Disorder Characteristics Child Collection Communication Communication Aids for Disabled Computer software Computers Data Data Set Databases Electronics Equipment and supply inventories Faculty Female Future Goals Home environment Individual Inferior Language Lead Libraries Licensing Life Modeling Modification Nature Nerve Degeneration Neurodevelopmental Disorder Numbers Output Parkinson Disease Partner Communications Pattern Persons Phase Phonetics Pliability Process Range Rate Records Robotics Sampling Services Shipping Ships Signal Transduction Source Speech Speech Acoustics Speech Sound Stroke System Technology Telephone Testing Text Time Training Trauma Traumatic Brain Injury Vendor Vocabulary Voice Voice Quality Voice Training Work alternative communication analog base computerized data processing digital improved input device male men&apos s group movie nervous system disorder programs psychologic research study size social sound user friendly software

项目摘要

DESCRIPTION (provided by applicant): A wide range of individuals cannot communicate by voice. Voice enabled Augmentative and Alternative Communication (AAC) devices are often the only channel available by which these individuals can communicate. While many voice enabled AAC devices are currently available, they lack the important ability to generate customized speech that mimics aspects of the user's past or intermittently available speech. Modern "concatenative" speech synthesis technology can mimic a given speaker's voice, by excising speech fragments from a recorded speech data base ("acoustic inventory") and recombining these into output speech using sophisticated algorithms. It requires, however, a large amount of recordings and a high degree of consistency of pronunciation of the speaker. Many AAC users cannot meet these requirements because they already have lost the capability to speak or they cannot speak with adequate consistency of pronunciation. A new type of technology, voice transformation (VT) technology, is available that can transform speech spoken by a "source" speaker into speech that is perceived as spoken by a specific "target" speaker. To tune the transformation system, parallel "training recordings" of the same text are needed from the source and target speakers. The amount of training recordings is far less than what is needed for a high-quality acoustic inventory. We propose to use VT in combination with speech synthesis to convert the synthesis system's acoustic inventory into an acoustic inventory that mimics the target speaker's voice. The training recordings can consist of old home videos, or fragmented recordings produced during periods of intact speech, provided that they contain at least one sample of each phoneme. In Phase I, we will develop and evaluate a VT based synthesis system. The project will use high- quality and home-video quality recordings from male and female adults and children to create limited acoustic inventories (adequate to generate a specific set of test sentences) and VT training recordings. Perceptual experiments will be conducted to evaluate voice quality and perceived speaker identity. Phase II will focus on developing complete acoustic inventories for several canonical speakers that will be selected to cover a range of speaker characteristics, and on producing portable, user-friendly software. The anticipated commercial offering consists of (i) software components to be licensed to AAC vendors and (ii) a service consisting of collection and processing of recordings and creation of personalized acoustic inventories. Speech communication ability is impaired or absent in millions of Americans due to neurological disorders and diseases and to trauma, including autism, Parkinson's disease, and stroke. Augmentative and Alternative Communication (AAC) devices that are operated via switches, keyboards, and a broad range of other input devices, and that have synthetic speech as output, are often the only manner in which these individuals can communicate. Without AAC devices, these individuals may suffer from severe social and psychological isolation, and may be unable to lead productive lives. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e., the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. The proposed project will use voice transformation (VT) technology to accomplish this goal. VT technology requires recordings of the user to be available, but there is substantial flexibility as to the nature and quantity of these recordings; they may consist of home videos or of fragmentary speech, provided that at least some samples are available of each speech sound in the language. The goal of the application is to develop a synthetic voice for an AAC system that sounds like the individual using the system (before they lost the ability to speak), without requiring very much recorded data on the part of the original talker. The system works by first creating a synthetic "base" voice (or set of base voices) using professional actors who must provide a fairly large inventory of speech data. Using the base voice and a small sample from the target talker (i.e., containing at least one instance of each phoneme), a new synthetic voice is created by essentially modulating parameters in the base voice so that it takes on characteristics of the target talker. The ability to create a voice that sounds like the original talker without much data from the original talker would be a significant advantage.

描述（由申请人提供）：很多人不能通过声音交流。支持语音的辅助和替代通信（AAC）设备通常是这些个人可以进行通信的唯一可用渠道。虽然目前有许多支持语音的AAC设备可用，但它们缺乏生成模仿用户过去或间歇性可用语音的定制语音的重要功能。现代“串联”语音合成技术可以通过从录制的语音数据库（“声学库存”）中删除语音片段，并使用复杂的算法将这些片段重新组合成输出语音，从而模拟给定说话者的声音。然而，它需要大量的录音和说话人的发音高度一致。许多AAC使用者无法满足这些要求，因为他们已经失去了说话的能力，或者他们的发音不能保持足够的一致性。语音转换（VT）技术是一种新型技术，它可以将“源”说话者所说的语音转换为特定“目标”说话者所感知的语音。为了调整转换系统，需要从源说话人和目标说话人对同一文本进行平行的“训练录音”。训练录音的数量远远少于高质量声学库存所需的数量。我们建议将VT与语音合成结合使用，将合成系统的声学库存转换为模仿目标说话者声音的声学库存。训练录音可以是旧的家庭录像，也可以是在完整语音期间产生的片段录音，只要它们至少包含每个音素的一个样本。在第一阶段，我们将开发和评估一个基于VT的合成系统。该项目将使用来自男性和女性成人及儿童的高质量和家庭录像质量的录音来创建有限的声音清单（足以生成一组特定的测试句子）和VT训练录音。将进行感知实验来评估语音质量和感知说话人身份。第二阶段将侧重于为几个典型扬声器开发完整的声学清单，这些扬声器将被选中以涵盖一系列扬声器特性，并生产便携式，用户友好的软件。预期的商业产品包括(i)授权给AAC供应商的软件组件和（ii）收集和处理录音以及创建个性化声学清单的服务。由于神经系统紊乱、疾病和创伤，包括自闭症、帕金森病和中风，数百万美国人的语言交流能力受损或缺失。辅助和替代通信（AAC）设备通过开关、键盘和各种其他输入设备进行操作，并以合成语音作为输出，这通常是这些人进行通信的唯一方式。如果没有AAC设备，这些人可能会遭受严重的社会和心理孤立，并可能无法过上富有成效的生活。一个重要的心理特征是，目前没有可用的系统具有用用户的声音说话的能力，也就是说，产生模仿个人发病前的语言或个人可能间歇性产生的语言的能力。拟议的项目将使用语音转换（VT）技术来实现这一目标。自动录像技术要求提供用户的录音，但这些录音的性质和数量有很大的灵活性；它们可以由家庭录像或语音片段组成，只要该语言中的每种语音至少有一些样本可用。该应用程序的目标是为AAC系统开发一种合成语音，听起来像使用该系统的个人（在他们失去说话能力之前），而不需要原始说话者的大量记录数据。该系统的工作原理是首先使用专业演员创建一个合成的“基本”声音（或一组基本声音），这些演员必须提供相当大的语音数据库存。使用基本声音和目标说话者的小样本（即，包含每个音素的至少一个实例），通过本质上调制基本声音中的参数，使其具有目标说话者的特征，创建新的合成声音。创造一个听起来像原始说话者的声音，而不需要原始说话者的太多数据的能力将是一个显着的优势。