CHS: Small: Compounding Dividends on Voice Banking

CHS:小:语音银行的复利红利

基本信息

  • 批准号:
    1816726
  • 负责人:
  • 金额:
    $ 10.41万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-03-01 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

Text to speech (TTS) synthesis has become a successful and ubiquitous technology. The area of application for TTS technology that motivates this research is its use for Augmentative and Alternative Communication (AAC). According to the American Speech-Language and Hearing Association (ASHA), more than two million people in the United States have severe communication disorders that impair their ability to talk. AAC devices that use TTS to create spoken output are used by many of these people to support communication. Historically, AAC users have had access to a relatively small family of generic TTS voices that are neither unique to them nor typically age- or dialect-appropriate. However, advances in TTS technology make it possible to create personalized synthetic voices that capture the unique vocal identity of AAC device users if they are able to record enough speech. This allows patients with neurodegenerative diseases such as ALS to "bank" their voice - that is, to record examples of their speech that can later be used to create a personal TTS voice - before the disease progresses to a point that they can no longer speak. Unfortunately, one major barrier to voice banking, especially for patients who may already be experiencing some difficulty speaking, is the amount of speech needed to create a natural sounding TTS voice that fully captures the vocal identity of the voice banker. To reduce this barrier, this research will combine a type of speech synthesis called parallel formant synthesis that was developed several decades ago, with deep learning computational techniques that allow a computer to learn how to control the parameters of the parallel formant synthesizer to reproduce the speech of a target speaker given examples of the target speaker's speech. A parallel formant synthesizer will be implemented and trained to model speech recorded by voice bankers, and its output will be compared with that of other synthesizers that have been trained with the same speech data. Objective measures of similarity between synthetic and natural utterances, and subjective measures of voice quality and similarity using human listeners, will be used. This will be the first step toward building a parallel formant synthesis-based voice conversion system capable of creating TTS voices from a small number of natural speech samples, and also better able to model the expressive nature of natural speech.Despite advances in TTS technology, there are multiple challenges to the application of this technology for voice banking. Specifically: (a) the amount of speech required (several hours) to create the most natural sounding TTS voices using unit selection or hybrid DNN/unit selection is prohibitive for most voice bankers; (b) existing voice conversion techniques that do not require large amounts of parallel speech from the target talker generally produce speech sounding less natural and less like the target speaker when compared to concatenative synthesis; and (c) both concatenative and statistical parametric techniques produce speech that is only as expressive as the data within the speech corpus from which they have been constructed or trained. Parallel formant synthesis, because it is based explicitly on the perceptually most salient features of natural speech and lends itself to independently modeling laryngeal, suprasegmental, and segmental features should be better able to address all three of these challenges. As proof of concept, a parallel formant synthesis (PFS) vocoder with DNN-based parameter estimation will be implemented. The vocoder will be implemented within the Merlin DNN synthesis framework so that speech output of the PFS system can be directly compared to output generated by the World and MagPhase vocoders. Training will be based on corpora drawn from the same set of 1600 utterances recorded by multiple individuals who have contributed their recordings to the ModelTalker project. The selected target talkers will be balanced for gender and span a wide range of English dialects, but use of speakers with noticeable levels of dysarthria will be avoided. Objective comparisons will be based on Mel-Cepstral Difference (MCD) between synthetic and natural sentence tokens that were not used in training the synthesizers. Subjective measures (Mean Opinion Scores) will be obtained from human listeners via Amazon Mechanical Turk.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
文本到语音 (TTS) 合成已成为一项成功且普遍存在的技术。推动这项研究的 TTS 技术的应用领域是其在增强和替代通信 (AAC) 中的应用。据美国言语和听力协会 (ASHA) 称,美国有超过 200 万人患有严重的沟通障碍,影响了他们的说话能力。其中许多人使用使用 TTS 创建语音输出的 AAC 设备来支持通信。从历史上看,AAC 用户可以使用相对较小的通用 TTS 语音系列,这些语音既不是他们独有的,也不是通常适合年龄或方言的。然而,TTS 技术的进步使得创建个性化合成语音成为可能,如果 AAC 设备用户能够录制足够的语音,则可以捕获他们独特的声音身份。这使得患有 ALS 等神经退行性疾病的患者能够在疾病发展到无法说话之前“存储”他们的声音,即记录他们的语音示例,以便以后用于创建个人 TTS 语音。不幸的是,语音银行的一个主要障碍,特别是对于可能已经遇到说话困难的患者来说,是创建自然的 TTS 语音所需的语音量,以充分捕捉语音银行人员的声音身份。为了减少这一障碍,这项研究将几十年前开发的一种称为并行共振峰合成的语音合成与深度学习计算技术结合起来,该技术允许计算机学习如何控制并行共振峰合成器的参数,以在给定目标说话者语音示例的情况下重现目标说话者的语音。将实施并训练并行共振峰合成器,以对语音银行人员记录的语音进行建模,并将其输出与使用相同语音数据训练的其他合成器的输出进行比较。将使用合成和自然话语之间相似性的客观测量,以及使用人类听众的语音质量和相似性的主观测量。这将是构建基于并行共振峰合成的语音转换系统的第一步,该系统能够从少量自然语音样本创建 TTS 语音,并且能够更好地对自然语音的表达本质进行建模。尽管 TTS 技术取得了进步,但将该技术应用于语音银行仍面临多种挑战。具体来说:(a) 使用单元选择或混合 DNN/单元选择创建听起来最自然的 TTS 语音所需的语音量(几个小时)对于大多数语音银行人员来说是令人望而却步的; (b) 现有的语音转换技术不需要来自目标讲话者的大量并行语音,与串联合成相比,通常会产生听起来不太自然且不太像目标讲话者的语音; (c) 连接和统计参数技术产生的语音仅与构建或训练它们的语音语料库中的数据一样具有表现力。并行共振峰合成,因为它明确地基于自然语音的感知上最显着的特征,并且适合独立建模喉部、超音段和音段特征,所以应该能够更好地解决所有这三个挑战。作为概念验证,将实现具有基于 DNN 参数估计的并行共振峰合成 (PFS) 声码器。声码器将在 Merlin DNN 合成框架内实现,以便 PFS 系统的语音输出可以直接与 World 和 MagPhase 声码器生成的输出进行比较。培训将基于从同一组 1600 条话语中提取的语料库,这些话语由多个向 ModelTalker 项目贡献了录音的个人记录。选定的目标说话者将在性别上保持平衡,并涵盖多种英语方言,但将避免使用具有明显构音障碍的说话者。客观比较将基于未在训练合成器时使用的合成句子标记和自然句子标记之间的 Mel-Cepstral Difference (MCD)。主观测量(平均意见分数)将通过 Amazon Mechanical Turk 从人类听众那里获得。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Unsupervised Training of a DNN-Based Formant Tracker
基于 DNN 的共振峰跟踪器的无监督训练
  • DOI:
    10.21437/interspeech.2021-1690
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lilley, Jason;Bunnell, H. Timothy
  • 通讯作者:
    Bunnell, H. Timothy
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

H. Timothy Bunnell其他文献

Reliable prediction of childhood obesity using only routinely collected EHRs may be possible
  • DOI:
    10.1016/j.obpill.2024.100128
  • 发表时间:
    2024-12-01
  • 期刊:
  • 影响因子:
  • 作者:
    Mehak Gupta;Daniel Eckrich;H. Timothy Bunnell;Thao-Ly T. Phan;Rahmatollah Beheshti
  • 通讯作者:
    Rahmatollah Beheshti

H. Timothy Bunnell的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Powering Small Craft with a Novel Ammonia Engine
用新型氨发动机为小型船只提供动力
  • 批准号:
    10099896
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Collaborative R&D
"Small performances": investigating the typographic punches of John Baskerville (1707-75) through heritage science and practice-based research
“小型表演”:通过遗产科学和基于实践的研究调查约翰·巴斯克维尔(1707-75)的印刷拳头
  • 批准号:
    AH/X011747/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Research Grant
Fragment to small molecule hit discovery targeting Mycobacterium tuberculosis FtsZ
针对结核分枝杆菌 FtsZ 的小分子片段发现
  • 批准号:
    MR/Z503757/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Research Grant
Bacteriophage control of host cell DNA transactions by small ORF proteins
噬菌体通过小 ORF 蛋白控制宿主细胞 DNA 交易
  • 批准号:
    BB/Y004426/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Research Grant
Windows for the Small-Sized Telescope (SST) Cameras of the Cherenkov Telescope Array (CTA)
切伦科夫望远镜阵列 (CTA) 小型望远镜 (SST) 相机的窗口
  • 批准号:
    ST/Z000017/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Research Grant
CSR: Small: Leveraging Physical Side-Channels for Good
CSR:小:利用物理侧通道做好事
  • 批准号:
    2312089
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Standard Grant
CSR: Small: Multi-FPGA System for Real-time Fraud Detection with Large-scale Dynamic Graphs
CSR:小型:利用大规模动态图进行实时欺诈检测的多 FPGA 系统
  • 批准号:
    2317251
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Standard Grant
AF: Small: Problems in Algorithmic Game Theory for Online Markets
AF:小:在线市场的算法博弈论问题
  • 批准号:
    2332922
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Standard Grant
Collaborative Research: FET: Small: Algorithmic Self-Assembly with Crisscross Slats
合作研究:FET:小型:十字交叉板条的算法自组装
  • 批准号:
    2329908
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Standard Grant
NeTS: Small: ML-Driven Online Traffic Analysis at Multi-Terabit Line Rates
NeTS:小型:ML 驱动的多太比特线路速率在线流量分析
  • 批准号:
    2331111
  • 财政年份:
    2024
  • 资助金额:
    $ 10.41万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了