Deep architectures for statistical speech synthesis

用于统计语音合成的深层架构

基本信息

  • 批准号:
    EP/J002526/1
  • 负责人:
  • 金额:
    $ 94.44万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Fellowship
  • 财政年份:
    2011
  • 资助国家:
    英国
  • 起止时间:
    2011 至 无数据
  • 项目状态:
    已结题

项目摘要

Speech synthesis is the conversion of written text into speech output. Applications range from telephone dialogue systems to computer games and clinical applications. Current speech synthesis systems have a very limited range of difference voices available. This is because it is complex and expensive to create them.Unfortunately, that is a big problem for many interesting applications, including one we are focusing on in this proposal: assistive communication aids for people with vocal problems due to Motor Neurone Disease and other conditions. At the moment, these people are forced to use devices with inappropriate voices, very often in the wrong accent and sometimes even of the wrong sex! This is a disincentive for them to communicate, even with their own family, since they do not "own" the voice and it does not reflect their identity. The voice is an integral part of identity, and we are creating the technology to allow people to communicate in their own voice, when their natural speech has become hard to understand or they can no longer speak at all.The technology we will develop has a lot of other applications too: it will enable a speech synthesiser to adjust not only the speaker identity but many other properties too. For example, adjusting speaking effort will simulate what human talkers do in noisy conditions to make their speech more intelligible. Our starting point is a technique we have pioneered, called speaker adaptation.Speaker adaptation has proven to be highly successful in enabling the flexible transformation of the characteristics of a text-to-speech synthesis system, based on a small amount of recorded speech. It can be used for changing the characteristics of the speech to a different speaker or speaking style. However, current methods do not use any deep knowledge about speech and does not generalise across similar situations. This is considerably less natural and flexible than human speech production, in which speech is controlled by human talkers based simply on prior experience. For instance, we effortlessly adapt our speech in noisy environments, compared with quiet environments, in order to increase intelligibility. The current adaptation techniques that we have pioneered are completely automatic, but they do not enable this prior knowledge to be incorporated in a straightforward way.In some preliminary work, we have developed a model which includes information about the movement of the speech articulators: the tongue, lips and so on. Then, using our knowledge of how humans alter their speech production in the presence of noise (hyper- & hypo-articulation), we have demonstrated that it is possible to improve the intelligibility of synthetic speech in noise.The current proposal is to extend and generalise this preliminary work, in order to integrate many other types of knowledge about human speech into this model. We will develop a new model which allows us to include more information about how speech is produced, as well as information about how it is perceived and how external factors, such as background noise, affect speech.One important application of this technology is to create personalised speech synthesis for people with disordered speech (caused by Motor Neurone Disease, for example). Current technology for creating voices does not work for these people, because their speech is usually already disordered. Our technique can actually correct this, and produce speech which sounds like the person, but is more intelligible than their current natural speech. We have already produced a proof-of-concept system demonstrating that this works. The current proposal will make the technology available and affordable to a wide range of people.
语音合成是将书面文本转换为语音输出。应用范围从电话对话系统到计算机游戏和临床应用。当前的语音合成系统具有非常有限的可用差异语音范围。这是因为创建它们既复杂又昂贵。不幸的是,这对于许多有趣的应用来说都是一个大问题,包括我们在本提案中重点关注的一个:为因运动神经元疾病和其他疾病而患有声音问题的人提供辅助沟通工具。目前,这些人被迫使用带有不适当声音的设备,通常是错误的口音,有时甚至是错误的性别!这阻碍了他们的沟通,即使是与自己的家人沟通,因为他们不“拥有”声音,而且声音并不反映他们的身份。声音是身份的一个组成部分,我们正在创造一种技术,让人们在自然语音变得难以理解或根本无法说话时,可以用自己的声音进行交流。我们将开发的技术还有很多其他应用:它将使语音合成器不仅可以调整说话者的身份,还可以调整许多其他属性。例如,调整说话力度将模拟人类说话者在嘈杂条件下的行为,使他们的讲话更容易理解。我们的出发点是我们首创的一项技术,称为说话人适应。事实证明,说话人适应在基于少量录制的语音实现文本到语音合成系统特征的灵活转换方面非常成功。它可用于更改不同说话者或说话风格的语音特征。然而,当前的方法没有使用任何关于语音的深入知识,也没有在类似的情况下进行概括。这比人类语音产生要自然和灵活得多,人类语音产生是由人类说话者仅根据先前的经验来控制的。例如,与安静的环境相比,我们在嘈杂的环境中可以毫不费力地调整我们的语音,以提高清晰度。我们开创的当前适应技术是完全自动的,但它们无法以直接的方式整合这些先验知识。在一些初步工作中,我们开发了一个模型,其中包括有关发音器官(舌头、嘴唇等)运动的信息。然后,利用我们对人类如何在存在噪声(高清晰度和低清晰度)的情况下改变语音产生的知识,我们证明了可以提高噪声中合成语音的清晰度。当前的建议是扩展和概括这项初步工作,以便将有关人类语音的许多其他类型的知识集成到该模型中。我们将开发一种新模型,使我们能够包含有关语音如何产生的更多信息,以及有关语音如何被感知以及外部因素(例如背景噪声)如何影响语音的信息。该技术的一个重要应用是为言语障碍(例如由运动神经元疾病引起)的人创建个性化语音合成。目前的发声技术对这些人不起作用,因为他们的言语通常已经混乱。我们的技术实际上可以纠正这个问题,并产生听起来像人的语音,但比他们当前的自然语音更容易理解。我们已经制作了一个概念验证系统,证明这是有效的。目前的提案将使广大人民群众能够使用并负担得起该技术。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Analysis of speaker clustering strategies for HMM-based speech synthesis
基于HMM的语音合成说话人聚类策略分析
Glottal Spectral Separation for Speech Synthesis
用于语音合成的声门频谱分离
Initial investigation of speech synthesis based on complex-valued neural networks
A fixed dimension and perceptually based dynamic sinusoidal model of speech
固定维度和基于感知的动态正弦语音模型
  • DOI:
    10.1109/icassp.2014.6854810
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hu Q
  • 通讯作者:
    Hu Q
Reactive Control of Expressive Speech Synthesis Using Kinect Skeleton Tracking
  • DOI:
  • 发表时间:
    2012-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    MagdalenaAnnaKonkiewicz;Astrinak Maria;Yamagishi Junichi
  • 通讯作者:
    MagdalenaAnnaKonkiewicz;Astrinak Maria;Yamagishi Junichi
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Junichi Yamagishi其他文献

Modeling and evaluation methods in current voice conversion tasks
当前语音转换任务中的建模与评估方法
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rohan Kumar Das;Tomi Kinnunen;Wen-Chin Huang;Zhen-Hua Ling;Junichi Yamagishi;Zhao Yi;Xiaohai Tian;Tomoki Toda;Yi Zhao ; Xin Wang ; Lauri Juvela ; Junichi Yamagishi;Yi Zhao
  • 通讯作者:
    Yi Zhao
日本中世におけるアブラナ科作物と仏教文化
中世纪日本的十字花科作物和佛教文化
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Fuming Fang;Junichi Yamagishi;Isao Echizen;Md Sahidullah;Tomi Kinnunen;Motoshi Suzuki and Azusa Uji;横内裕人
  • 通讯作者:
    横内裕人
Collision Resistance of Double-Block-Length Hash Function against Free-Start Attack
双块长度哈希函数对抗自由启动攻击的碰撞抵抗
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Seungzoo Jeong;Woong Choi;Naoki Hashimoto;Makoto Sato;川島啓吾;Junichi Yamagishi;Yuji Nakano;井澤信介;橘 誠;大川高志;Naotake Niwase;Takashi Yamazaki;S. Hirose
  • 通讯作者:
    S. Hirose
Deepfakeの生成と検出の現状
Deepfake 生成和检测的现状
CNNを活用したモバイルアプリケーション利用時のユーザ移動状態推定の精度評価
使用CNN进行移动应用时用户运动状态估计的准确性评估
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xin Wang;Shinji Takaki;Junichi Yamagishi;川上 航・金井謙治・Bo Wei・甲藤二郎
  • 通讯作者:
    川上 航・金井謙治・Bo Wei・甲藤二郎

Junichi Yamagishi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Highly Tunable Brush-Like Polymer Architectures to Control Therapeutic Delivery and Cell-Material Interactions
高度可调的刷状聚合物架构,用于控制治疗传递和细胞材料相互作用
  • 批准号:
    10669252
  • 财政年份:
    2022
  • 资助金额:
    $ 94.44万
  • 项目类别:
Neuroimage-driven biophysical inverse problems for atrophy and tau propagation
神经图像驱动的萎缩和 tau 传播的生物物理逆问题
  • 批准号:
    10302105
  • 财政年份:
    2021
  • 资助金额:
    $ 94.44万
  • 项目类别:
From common to rare variant functional architectures of human diseases
从人类疾病的常见到罕见变异功能结构
  • 批准号:
    10209027
  • 财政年份:
    2020
  • 资助金额:
    $ 94.44万
  • 项目类别:
From common to rare variant functional architectures of human diseases
从人类疾病的常见到罕见变异功能结构
  • 批准号:
    10237415
  • 财政年份:
    2020
  • 资助金额:
    $ 94.44万
  • 项目类别:
From common to rare variant functional architectures of human diseases
从人类疾病的常见到罕见变异功能结构
  • 批准号:
    10408102
  • 财政年份:
    2020
  • 资助金额:
    $ 94.44万
  • 项目类别:
Using Integrative Networks to Explore Heterogeneous Phenotypes in COPD
使用综合网络探索 COPD 的异质表型
  • 批准号:
    9320981
  • 财政年份:
    2016
  • 资助金额:
    $ 94.44万
  • 项目类别:
Using Integrative Networks to Explore Heterogeneous Phenotypes in COPD
使用综合网络探索 COPD 的异质表型
  • 批准号:
    9164450
  • 财政年份:
    2016
  • 资助金额:
    $ 94.44万
  • 项目类别:
Harnessing Scalable Libraries for Statistical Computing on Modern Architectures and Bringing Statistics to Large Scale Computing
利用可扩展库进行现代架构上的统计计算并将统计引入大规模计算
  • 批准号:
    1418195
  • 财政年份:
    2014
  • 资助金额:
    $ 94.44万
  • 项目类别:
    Continuing Grant
Strategies and Techniques for Analyzing Microbial Population Structures
分析微生物种群结构的策略和技术
  • 批准号:
    9273561
  • 财政年份:
    2013
  • 资助金额:
    $ 94.44万
  • 项目类别:
Strategies and Techniques for Analyzing Microbial Population Structures
分析微生物种群结构的策略和技术
  • 批准号:
    8548735
  • 财政年份:
    2013
  • 资助金额:
    $ 94.44万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了