VocaliD SBIR Phase II: Optimized Speech Corpora for Personalized Speech Synthesis

VocaliD SBIR 第二阶段:用于个性化语音合成的优化语音语料库

基本信息

  • 批准号:
    9408604
  • 负责人:
  • 金额:
    $ 60.83万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-06-01 至 2019-06-30
  • 项目状态:
    已结题

项目摘要

Our voices are not identical, they are our identities. The human voice is a powerful signal that conveys one's age, gender, size, ethnicity, and personality, among other attributes. Yet, until now, users of augmentative and alternative communication (AAC) devices, screen reading technologies and other text-to-speech (TTS) applications have relied on a limited set of mass-produced, generic-sounding synthetic voices. This mismatch in vocal identity impacts educational outcomes, infringes on personal safety, and hinders social integration. Conventional methods for building a synthetic voice require a voice actor to record an extensive dataset of studio-quality recordings which are used to train a computational model and generate the output voice. The process is time and labor intensive and thus inaccessible to everyday consumers let alone those with speech impairment. VocaliD Inc's award winning technology offers an unprecedented means to build custom crafted synthetic voices that reflect the recipient by combining his/her own residual vocalizations with recordings of a matched speaker from our Human Voicebank. We have discovered that even a single vowel contains enough "vocal DNA" to seed the personalization process. VocaliD's custom voice sounds like the recipient in age, personality and vocal identity but is as clear and understandable as the donor's recordings. To create an affordable and efficient method of voice personalization, we leverage the penetration of high quality microphones and recording software on consumer grade computers and increased technological literacy to crowdsource the collection of speech and voice recordings. This enables engagement across broad age, socioeconomic, cultural and linguistic groups in order to truly sample the diversity of the human voice. The challenges, however, are to ensure high quality recordings and to sufficiently engage speech donors to complete the recording corpus. This Phase II project builds upon our success in Phase I to reduce the length of the donor corpus and to streamline and automate the recipient protocol. Results of our perceptual experiments indicated that while we were able to reduce the length of the donor corpus by 70%, it came at the cost of reduced intelligibility and naturalness. Since voice quality is vital to acceptance and adoption of our voices, this Phase II proposal is aimed at improving the clarity and expressiveness of our voices while maintaining the optimized corpus length. We propose to improve TTS intelligibility by developing methods to mitigate the effects of background noise and reverberation during donor and recipient recordings and aligning expected and actual spoken transcripts to reduce errors in TTS model building (Aim 1). To address the issue of TTS naturalness, we propose to modify the donor corpus to include more prosodically diverse contrasts and adapt the donor protocol to elicit natural melodic intonation and phrasing (Aim 2). These advances will yield a scalable and cost-effective method of personalized voice creation that will humanize speech-enabled technologies for AAC and beyond.
我们的声音并不相同,它们是我们的身份。人类的声音是一种强有力的信号, 年龄、性别、体型、种族和个性等属性。然而,到目前为止, 替代通信(AAC)设备、屏幕阅读技术和其他文本到语音(TTS) 应用依赖于有限的一组大规模生产的、听起来一般的合成语音。这种不匹配, 口头认同影响教育成果,侵犯人身安全,阻碍社会融合。 用于构建合成语音的常规方法需要配音演员记录大量的语音数据集, 录音室质量的录音,用于训练计算模型并生成输出语音。的 这个过程是时间和劳动密集型的,因此日常消费者无法接触到,更不用说那些说话的人了 损伤VocaliD公司的获奖技术提供了一个前所未有的手段,建立定制的 合成声音,通过将他/她自己的残余发声与录音相结合来反映接受者。 与我们的人类语音库匹配的人我们发现即使是一个元音 "声音DNA"来播种个性化过程。VocaliD的定制声音听起来像年龄的接收者, 个性和声音的身份,但作为捐赠者的录音清晰易懂。创建一个 负担得起的和有效的语音个性化的方法,我们利用高质量的渗透 麦克风和录音软件的消费级计算机和提高技术素养, 众包收集演讲和录音。这使得参与跨越了广泛的年龄, 社会经济、文化和语言群体之间的对话,以便真正了解人类声音的多样性。的 然而,挑战是确保高质量的录音,并充分吸引演讲捐赠者完成 录音语料库这个第二阶段项目建立在我们在第一阶段的成功,以减少捐助者的长度, 语料库和简化和自动化的接收方协议。我们的知觉实验结果表明, 虽然我们能够将供体语料库的长度减少70%,但这是以减少 可理解性和自然性。由于语音质量对于接受和采用我们的声音至关重要,因此第二阶段 该提案旨在提高我们声音的清晰度和表现力,同时保持优化的 语料长度我们建议通过开发方法来减轻 在供体和受体记录期间的背景噪声和混响,并将预期和实际 口语成绩单,以减少TTS模型构建中的错误(目标1)。为了解决TTS自然度的问题,我们 建议修改捐赠语料库,以包括更多的韵律多样性对比,并适应捐赠协议 引出自然的旋律语调和乐句(目标2)。这些进步将产生一个可扩展的和具有成本效益的 一种个性化的语音创建方法,将使AAC及其他语音技术人性化。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

RUPAL PATEL其他文献

RUPAL PATEL的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('RUPAL PATEL', 18)}}的其他基金

Multimodal Speech Translation for Assistive Communication
用于辅助沟通的多模态语音翻译
  • 批准号:
    8737379
  • 财政年份:
    2014
  • 资助金额:
    $ 60.83万
  • 项目类别:
Multimodal Speech Translation for Assistive Communication
用于辅助沟通的多模态语音翻译
  • 批准号:
    8913172
  • 财政年份:
    2014
  • 资助金额:
    $ 60.83万
  • 项目类别:
Prosody in Congenital and Acquired Dysarthria.
先天性和后天性构音障碍的韵律。
  • 批准号:
    8636737
  • 财政年份:
    2013
  • 资助金额:
    $ 60.83万
  • 项目类别:
Prosody in Congenital and Acquired Dysarthria.
先天性和后天性构音障碍的韵律。
  • 批准号:
    8763936
  • 财政年份:
    2013
  • 资助金额:
    $ 60.83万
  • 项目类别:
Acquisition of Prosodic Control in Typically Developing Children
典型发育儿童的韵律控制能力的获得
  • 批准号:
    8048490
  • 财政年份:
    2011
  • 资助金额:
    $ 60.83万
  • 项目类别:
Acquisition of Prosodic Control in Typically Developing Children
典型发育儿童的韵律控制能力的获得
  • 批准号:
    8207838
  • 财政年份:
    2011
  • 资助金额:
    $ 60.83万
  • 项目类别:
Identifying Communicative Signals in Dysarthric Speech
识别构音障碍言语中的交流信号
  • 批准号:
    6794230
  • 财政年份:
    2004
  • 资助金额:
    $ 60.83万
  • 项目类别:
Identifying Communicative Signals in Dysarthric Speech
识别构音障碍言语中的交流信号
  • 批准号:
    6866487
  • 财政年份:
    2004
  • 资助金额:
    $ 60.83万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Fellowship
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Continuing Grant
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Research Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.83万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了