权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exemplar-based Expressive Speech Synthesis

基于样本的表达性语音合成

基本信息

批准号：
EP/V046772/1
负责人：
Anton Ragni
金额：
$ 27.81万
依托单位：
University of Sheffield
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FV046772%2F1
关键词：
Exemplar based Expressive Speech Synthesis

项目摘要

Synthetic voices are becoming ubiquitous: `smart' speakers at home, announcement systems on public transport, and voice-enabled assistants on call lines. There exist a strong public demand for `smarter' assistants capable of laughing at our jokes; interacting with our children as encouraging and emphatic tutors; calling to check up on our parents; providing a reassuring `ear' for an isolated person; and offering calming and supportive virtual therapy. To support current and future applications, voice synthesis technology needs to satisfy a number of requirements. First, it needs to be customisable for rapid research and development, and second, it needs to be able to produce any spoken content, including expressive voice characteristics. However, none of the current synthesis technologies can simultaneously satisfy all of the above requirements. For instance, while current non-machine learning approaches allow pre-recorded phrases to be efficiently combined into complete sentences, it also means that missing necessary phrases must be recorded first, thereby limiting their flexibility and efficiency. On the other hand, current machine learning models can seamlessly synthesise any spoken content. However, creating such models is a very costly, time-consuming and computationally demanding process. Furthermore, these models offer a very limited control over the qualities of the voice characteristics and lack interpretability, which are highly desirable conditions in both research and commercial settings.In this project, the objective is to develop a computationally efficient, customisable, expressive and interpretable speech synthesis, by drawing from the concept of `exemplars' in cognitive science.In the field of cognitive science, the notions of `exemplars' and `prototypes' form a part of a prominent view on how humans categorise concepts. In particular, exemplar theory argues that singular examples, rather than prototypes (an average of examples), form the basic building blocks of how we understand and interact with the world. The key argument in favour of exemplar theory is our ability as humans to solve complex tasks based on just a few examples, which makes this theory appealing to applications that involve complex phenomena or that require high computational efficiency. Furthermore, expressive speech synthesis combines expressivity and speech production, which are two complex phenomena that remain poorly understood. Unlike prototype theory, exemplar theory, at least theoretically, enables to produce expressive speech, provided that at least one recording of the desired spoken content and one recording featuring the desired expressivity are available. Lastly, adopting exemplar theory promotes transparency during the decision making process through the use of real examples that can be inspected, modified, replaced, added, etc. within the task.The objective will be achieved through three innovative means by: i) formulating a methodological framework for exemplar-based speech synthesis, ii) building an exemplar-based representation for speech expressivity from pre-recorded examples and iii) presenting a novel methodology for integrating this expressivity-based representation into the framework of i).

合成语音正在变得无处不在：家里的“智能”扬声器、公共交通工具上的广播系统以及呼叫线路上的语音助手。公众对“更聪明”的助手有着强烈的需求，他们能够为我们的笑话发笑;能够作为鼓励和指导的导师与我们的孩子互动;能够打电话检查我们的父母;能够为孤立的人提供一个令人放心的“耳朵”;能够提供平静和支持性的虚拟治疗。为了支持当前和未来的应用，语音合成技术需要满足许多要求。首先，它需要可定制以进行快速研发，其次，它需要能够产生任何口语内容，包括富有表现力的语音特征。然而，目前的合成技术都不能同时满足所有上述要求。例如，虽然当前的非机器学习方法允许将预先记录的短语有效地组合成完整的句子，但这也意味着必须首先记录缺失的必要短语，从而限制了它们的灵活性和效率。另一方面，当前的机器学习模型可以无缝合成任何口语内容。然而，创建这样的模型是一个非常昂贵，耗时和计算要求的过程。此外，这些模型对语音特征的质量提供了非常有限的控制，并且缺乏可解释性，这在研究和商业环境中都是非常理想的条件。在本项目中，目标是通过借鉴认知科学中的“范例”概念，开发一种计算效率高、可定制、表达性和可解释的语音合成。“范例”和“原型”的概念构成了关于人类如何对概念进行分类的突出观点的一部分。具体而言，范例理论认为，单一的例子，而不是原型（例子的平均值），形成了我们如何理解和与世界互动的基本构建块。支持范例理论的关键论点是，我们作为人类能够基于几个例子来解决复杂的任务，这使得这个理论对涉及复杂现象或需要高计算效率的应用程序具有吸引力。此外，表达性语音合成结合了表达性和语音产生，这是两个复杂的现象，仍然知之甚少。与原型理论不同，范例理论至少在理论上能够产生表达性的语音，前提是至少有一个所需口语内容的录音和一个具有所需表达性的录音可用。最后，采用范例理论，通过使用真实的范例，可以在任务中进行检查、修改、替换、添加等，从而提高决策过程的透明度。通过以下三种创新手段实现目标：i）制定用于基于范例的语音合成的方法框架，ii）从预先记录的示例中构建用于语音表达的基于示例的表示，以及iii）呈现用于将该基于表达的表示集成到i）的框架中的新颖方法。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Anton Ragni其他文献

Adapting Pretrained Models for Adult to Child Voice Conversion

采用预训练模型进行成人到儿童的语音转换

DOI：
发表时间：
2023
期刊：
European Signal Processing Conference
影响因子：
0
作者：
Protima Nomo Sudro;Anton Ragni;Thomas Hain
通讯作者：
Thomas Hain

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

通过文本到构音障碍语音合成来增强构音障碍自动语音识别的训练数据

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Wing;Mattias Cross;Anton Ragni;Stefan Goetze
通讯作者：
Stefan Goetze

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

使用中级 ASR 特征和人类记忆模型对听力受损用户进行非侵入式语音清晰度预测

DOI：
10.48550/arxiv.2401.13611
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Rhiannon Mogridge;George Close;Robert Sutherland;Thomas Hain;Jon Barker;Stefan Goetze;Anton Ragni
通讯作者：
Anton Ragni