权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Next-Generation Expressive Personalized Voices for Speech-Generating Devices

用于语音生成设备的下一代富有表现力的个性化声音

基本信息

批准号：
10547241
负责人：
H TIMOTHY Bunnell
金额：
$ 27.58万
依托单位：
SYNFONICA, LLC
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-08-15 至 2024-08-14
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10547241
关键词：
ALS patients Adoption Adult Age Algorithms Amyotrophic Lateral Sclerosis Augmentative and Alternative Communication Characteristics Child Child Health Client Depressed mood Disease Dysarthria Emotions Encapsulated Evaluation Female Generations Goals Government Human Hybrids Individual Knowledge Laboratory Research Learning Linguistics Machine Learning Methods Modeling Network-based Neurodegenerative Disorders Onset of illness Outcome Output Persons Phase Process Production Reading Records Rehabilitation therapy Risk Running Services Speech Structure Surveys System Technology Text Training Voice Voice Quality base commercial application communication device deep neural network design experience experimental study improved knowledge base machine learning algorithm male mimetics next generation novel sound success virtual vocal tract

项目摘要

Project Summary/Abstract The creation of personalized synthetic voices has wide application in medical/rehabilitation settings for pa- tients who rely on a speech-generating device (SGD) for communication. One common application is voice banking, wherein a person who risks losing their voice, such as somebody with a neurodegenerative disease like Amyotrophic Lateral Sclerosis (ALS), records their own speech before the onset of disease-related dysar- thria for later use in an SGD that mimics their natural speech characteristics. While the technology underlying the creation of such personalized synthetic voices is growing in maturity and adoption by SGD users, it still suf- fers from two primary limitations: a lack of expressiveness and a burdensome amount of recording needed to create highly natural-sounding voices. The proposed project aims to remedy this situation by marrying the ma- chine-learning technology behind ModelTalker, a pioneering voice-banking text-to-speech service developed at Nemours Children’s Health, with the knowledge-based technology underlying Synfony, a rule-based text-to- speech system developed by Synfonica LLC, which is capable of generating a variety of speech styles and ex- pressive modes. The expert knowledge built into Synfonica will be used to design an optimal set of sentences for voice bankers to record, and its algorithms for the generation of natural-sounding prosody in different modes and styles will be integrated into ModelTalker’s machine-learning algorithms, creating a hybrid system that embraces the best qualities of both approaches. The new text-to-speech (TTS) system resulting from this project will (a) require a minimal amount of recorded speech from the voice banker, (b) accurately capture their vocal identity, and (c) be structured such that new expressive modes and speech styles can be added easily without additional recording. The feasibility of the project will be demonstrated by recording the voices of an adult male, an adult female, and a child, and generating TTS voices that can speak in three expressive modes (neutral, happy, and sad). Perceptual experiments will be run to evaluate their intelligibility, naturalness, suc- cess in capturing the vocal identity of the speaker, and the appropriateness of their expressive modes. In gen- eral, the project will be a major step forward in enabling the users of personalized synthetic voices to express their emotions and intentions.

项目总结/摘要个性化合成语音的创建在医疗/康复环境中具有广泛的应用，依赖语音生成设备（SGD）进行交流的青少年。一个常见的应用是语音银行业，其中一个人谁的风险失去了他们的声音，如有人与神经退行性疾病像肌萎缩性侧索硬化症（ALS），在疾病相关的dysar发作之前记录他们自己的语言， thria，以便以后在模拟其自然语音特征的SGD中使用。虽然背后的技术这种个性化合成语音的创建正在成熟并被SGD用户采用，但它仍然足够，这是由于两个主要的限制：缺乏表现力和需要大量的记录，创造出非常自然的声音。拟议项目旨在通过与马- ModelTalker是一种开创性的语音银行文本到语音服务， Nemours儿童健康，与基于知识的技术基础的Synfony，一个基于规则的文本到语音系统开发的Synfonica有限责任公司，这是能够产生各种语音风格和前，压力模式Synfonica内置的专家知识将用于设计一组最佳句子的语音银行家记录，其算法的自然发声韵律的产生，在不同的模式和风格将被集成到ModelTalker的机器学习算法中，创建一个混合系统它包含了两种方法的最佳品质。由此产生的新的文本到语音（TTS）系统项目将（a）需要从语音银行家记录的语音最小量，（B）准确捕捉他们的声音身份，（c）结构，使新的表达方式和讲话风格可以很容易地添加没有额外的记录。该项目的可行性将通过记录一个成年男性、成年女性和儿童，并生成可以以三种表达模式说话的TTS语音（中性、快乐和悲伤）。将进行知觉实验，以评估其可理解性，自然性，可理解性， cess在捕捉说话者的声音身份，以及他们的表达模式的适当性。在gen- 总的来说，该项目将是一个重大的一步，使用户的个性化合成语音表达他们的情绪和意图