User Adaptation of AAC Device Voices

AAC设备语音的用户适配

基本信息

  • 批准号:
    7219057
  • 负责人:
  • 金额:
    $ 15.01万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2007
  • 资助国家:
    美国
  • 起止时间:
    2007-01-01 至 2008-06-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): A wide range of individuals cannot communicate by voice. Voice enabled Augmentative and Alternative Communication (AAC) devices are often the only channel available by which these individuals can communicate. While many voice enabled AAC devices are currently available, they lack the important ability to generate customized speech that mimics aspects of the user's past or intermittently available speech. Modern "concatenative" speech synthesis technology can mimic a given speaker's voice, by excising speech fragments from a recorded speech data base ("acoustic inventory") and recombining these into output speech using sophisticated algorithms. It requires, however, a large amount of recordings and a high degree of consistency of pronunciation of the speaker. Many AAC users cannot meet these requirements because they already have lost the capability to speak or they cannot speak with adequate consistency of pronunciation. A new type of technology, voice transformation (VT) technology, is available that can transform speech spoken by a "source" speaker into speech that is perceived as spoken by a specific "target" speaker. To tune the transformation system, parallel "training recordings" of the same text are needed from the source and target speakers. The amount of training recordings is far less than what is needed for a high-quality acoustic inventory. We propose to use VT in combination with speech synthesis to convert the synthesis system's acoustic inventory into an acoustic inventory that mimics the target speaker's voice. The training recordings can consist of old home videos, or fragmented recordings produced during periods of intact speech, provided that they contain at least one sample of each phoneme. In Phase I, we will develop and evaluate a VT based synthesis system. The project will use high- quality and home-video quality recordings from male and female adults and children to create limited acoustic inventories (adequate to generate a specific set of test sentences) and VT training recordings. Perceptual experiments will be conducted to evaluate voice quality and perceived speaker identity. Phase II will focus on developing complete acoustic inventories for several canonical speakers that will be selected to cover a range of speaker characteristics, and on producing portable, user-friendly software. The anticipated commercial offering consists of (i) software components to be licensed to AAC vendors and (ii) a service consisting of collection and processing of recordings and creation of personalized acoustic inventories. Speech communication ability is impaired or absent in millions of Americans due to neurological disorders and diseases and to trauma, including autism, Parkinson's disease, and stroke. Augmentative and Alternative Communication (AAC) devices that are operated via switches, keyboards, and a broad range of other input devices, and that have synthetic speech as output, are often the only manner in which these individuals can communicate. Without AAC devices, these individuals may suffer from severe social and psychological isolation, and may be unable to lead productive lives. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e., the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. The proposed project will use voice transformation (VT) technology to accomplish this goal. VT technology requires recordings of the user to be available, but there is substantial flexibility as to the nature and quantity of these recordings; they may consist of home videos or of fragmentary speech, provided that at least some samples are available of each speech sound in the language. The goal of the application is to develop a synthetic voice for an AAC system that sounds like the individual using the system (before they lost the ability to speak), without requiring very much recorded data on the part of the original talker. The system works by first creating a synthetic "base" voice (or set of base voices) using professional actors who must provide a fairly large inventory of speech data. Using the base voice and a small sample from the target talker (i.e., containing at least one instance of each phoneme), a new synthetic voice is created by essentially modulating parameters in the base voice so that it takes on characteristics of the target talker. The ability to create a voice that sounds like the original talker without much data from the original talker would be a significant advantage.
描述(由申请人提供):广泛的个人无法通过语音进行交流。启用语音的增强性和替代通信(AAC)设备通常是这些人可以通信的唯一可用渠道。虽然目前有许多启用了AAC设备的声音,但它们缺乏生成定制语音的重要能力,这些语音模仿了用户过去或间歇性语音的各个方面。现代的“串联”语音合成技术可以通过从记录的语音数据库(“声明库存”)中切除语音片段,并使用复杂的算法将这些语音片段(“声明”)重新组合到输出语音中,从而模仿给定的说话者的声音。但是,它需要大量的录音和说话者发音的高度一致性。许多AAC用户无法满足这些要求,因为他们已经失去了说话的能力,或者他们无法以发音的足够一致性说话。可以使用一种新型的技术,即语音转换(VT)技术,可以将“源”演讲者说的语音转换为语音,而语音被认为是特定的“目标”扬声器所说的。为了调整转换系统,需要从源和目标扬声器的同一文本的平行“训练记录”。训练记录的数量远低于高质量的声学清单所需的量。我们建议将VT与语音合成结合使用,将合成系统的声学清单转换为模仿目标扬声器声音的声学清单。培训记录可以由旧的家庭视频或完整语音期间产生的零散录音组成,前提是它们至少包含每个音素的一个样本。在第一阶段,我们将开发和评估基于VT的合成系统。该项目将使用男性和女性成人和儿童的高质量和家庭视频质量录音,以创建有限的声学库存(足以生成一组特定的测试句子)和VT培训记录。将进行感知实验,以评估语音质量和感知的说话者身份。第二阶段将着重于为几个规范扬声器开发完整的声学清单,以涵盖一系列扬声器特征,并生产便携式,用户友好的软件。预期的商业产品由(i)软件组件组成,该软件组件将获得AAC供应商的许可,以及(ii)由收集和处理录音和创建个性化声明库存组成的服务。由于神经系统疾病和疾病以及包括自闭症,帕金森氏病和中风在内的创伤,数百万美国人的言语交流能力受损或缺失。通过开关,键盘和广泛的其他输入设备操作且具有合成语音作为输出的增强性通信(AAC)设备通常是这些人可以交流的唯一方式。如果没有AAC设备,这些人可能会遭受严重的社会和心理隔离,并且可能无法过上生产的生活。目前没有系统的心理重要特征是能够与用户的声音说话,即产生演讲能力,这些语音模仿个人可能会间歇性地产生的人的言语或言语。拟议的项目将使用语音转换(VT)技术来实现这一目标。 VT技术需要可用的用户录音,但是这些录音的性质和数量具有很大的灵活性。它们可能包括家庭视频或零碎的演讲,前提是至少有一些语言中的每个语音的样本可用。该应用程序的目的是为AAC系统开发一种合成语音,该系统听起来像是使用该系统的个人(在失去说话能力之前),而无需对原始讲话者的录制数据很多。该系统通过首先使用必须提供相当大的语音数据清单的专业参与者创建合成的“基础”语音(或一组基本声音)来工作。使用基本语音和来自目标谈话者的小样本(即包含每个音素的至少一个实例),通过基本上调节基本语音中的参数来创建一个新的合成语音,以便它具有目标谈话者的特征。创建声音听起来像原始说话者的声音的能力,而没有原始说话者的大量数据将是一个重要的优势。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jan van Santen其他文献

Jan van Santen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jan van Santen', 18)}}的其他基金

Voice Transformation for Dysarthria - Phase I
构音障碍的语音转换 - 第一阶段
  • 批准号:
    7162050
  • 财政年份:
    2006
  • 资助金额:
    $ 15.01万
  • 项目类别:

相似国自然基金

成人型弥漫性胶质瘤患者语言功能可塑性研究
  • 批准号:
    82303926
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
MRI融合多组学特征量化高级别成人型弥漫性脑胶质瘤免疫微环境并预测术后复发风险的研究
  • 批准号:
    82302160
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
SMC4/FoxO3a介导的CD38+HLA-DR+CD8+T细胞增殖在成人斯蒂尔病MAS发病中的作用研究
  • 批准号:
    82302025
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
融合多源异构数据应用深度学习预测成人肺部感染病原体研究
  • 批准号:
    82302311
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Effects of deep brain stimulation (DBS) on laryngeal function and associated behaviors in Parkinson Disease
深部脑刺激(DBS)对帕金森病喉功能和相关行为的影响
  • 批准号:
    10735930
  • 财政年份:
    2023
  • 资助金额:
    $ 15.01万
  • 项目类别:
Conference on Implantable Auditory Prostheses
植入式听觉假体会议
  • 批准号:
    10606813
  • 财政年份:
    2023
  • 资助金额:
    $ 15.01万
  • 项目类别:
How infant-directed speech organizes the attentional state of infants
面向婴儿的言语如何组织婴儿的注意力状态
  • 批准号:
    10887662
  • 财政年份:
    2023
  • 资助金额:
    $ 15.01万
  • 项目类别:
Identifying the presence of a code-switch: Evaluating the role of acoustic cues.
识别代码转换的存在:评估声音提示的作用。
  • 批准号:
    10751509
  • 财政年份:
    2023
  • 资助金额:
    $ 15.01万
  • 项目类别:
Assessment of Cochlear Dysfunction in Black and White Adults with Stage 2 Hypertension Using High-Frequency Distortion Product Otoacoustic Emissions
使用高频失真产物耳声发射评估患有 2 期高血压的黑人和白人成人的耳蜗功能
  • 批准号:
    10652892
  • 财政年份:
    2023
  • 资助金额:
    $ 15.01万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了