权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Creating Text-to-Speech Synthesis for Low Resource Languages

RI：小型：为低资源语言创建文本到语音合成

基本信息

批准号：
1717680
负责人：
Julia Hirschberg
金额：
$ 50万
依托单位：
Columbia University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-09-01 至 2021-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1717680&HistoricalAwards=false
关键词：
RI Small Creating Text Speech

项目摘要

Recent advances in speech technology have resulted in wide use of Spoken Dialogue Systems (SDS) such as Siri (iPhone) and Google Assistant (Android). These systems support major improvements in information access by voice for High Resource Languages (HRLs) such as English, French, Mandarin, Japanese, and Spanish. For these languages, researchers have built dictionaries, parsers, part-of-speech taggers, language models, search engines, and machine translation engines to support speech technologies. However, there are ~6500 world languages, including Tagalog, Tamil, Swahili, Vietnamese and Pashto, many of which are spoken by millions of people, but which do not enjoy the computational resources necessary to build SDS. These are termed Low Resource Languages (LRLs). Speakers of LRLs do not benefit from the same communication and search capabilities speakers of HRLs do. In particular, there is little research and few resources supporting the development of Text-to-Speech Synthesis (TTS) systems to produce Siri-like speech for SDS in these languages. Furthermore, both commercial and research TTS systems also require large amounts of carefully recorded, single-speaker speech data, creating another major (and expensive) barrier to TTS development for LRLs. This work will create TTS systems in LRLs and, in the process, create and make available tools for others to create their own systems using "found" data - data recorded for other purposes or available on the web.New paradigms for TTS synthesis (parametric synthesis and the use of Deep Neural Nets) are now being developed which make it theoretically possible to build systems quickly and cheaply without recording large, special-purpose speech corpora, instead using data recorded for other purposes such as training speech recognizers. This work will investigate the use these techniques to produce TTS systems for LRL. Two major problems will be explored: 1) What are the best techniques to filter found data (removing data that is too loud, too noisy or disfluent, for example) to obtain intelligible and natural-sounding results? 2) Can basic prosodic features of LRLs such as phrasing and emphasis be identified, using crowdsourcing and tools developed for HRLs? Pilot studies on English have revealed that more natural and intelligible voices can be created by using subsets of the data selected on features such as pitch variation and level of articulation. These methods will be tested on LRLs such as Turkish, Amharic, and Telugu. Evaluations will be made in terms of intelligibility and naturalness both automatically and using crowdsourcing techniques with native speakers of each language. The ultimate goal of this exploratory work will be to test these techniques on a broad variety of LRLs which have been collected for purposes of developing speech recognizers.

语音技术的最新进展导致了语音对话系统（SDS）的广泛使用，例如Siri（iPhone）和Google Assistant（Android）。这些系统支持通过高资源语言（HRL）（如英语、法语、普通话、日语和西班牙语）的语音信息访问的重大改进。对于这些语言，研究人员已经建立了字典，解析器，词性标记器，语言模型，搜索引擎和机器翻译引擎来支持语音技术。然而，世界上有大约6500种语言，包括泰米尔语、斯瓦希里语、越南语和普什图语，其中许多语言被数百万人使用，但它们并不享有构建SDS所需的计算资源。这些语言被称为低资源语言（LRL）。LRL的发言者不受益于相同的通信和搜索能力的HRL发言者做。特别是，有很少的研究和资源支持的文本到语音合成（TTS）系统的发展，以产生类似的语音SDS在这些语言。此外，商业和研究TTS系统还需要大量仔细记录的单说话者语音数据，这为LRL的TTS开发创造了另一个主要（和昂贵的）障碍。这项工作将在LRL中创建文语转换系统，并在此过程中为其他人创建和提供工具，使他们能够利用“发现的”数据----为其他目的记录的数据或网上提供的数据----创建自己的系统。（参数合成和深度神经网络的使用）现在正在开发，这使得理论上可以快速和廉价地构建系统，而无需记录大量，专用语音语料库，而不是使用为其他目的记录的数据，如训练语音识别器。这项工作将调查使用这些技术，以产生TTS系统的LRL。将探讨两个主要问题：1）过滤找到的数据的最佳技术是什么（例如，删除太大声，太嘈杂或不流利的数据），以获得可理解和自然的结果？2)使用众包和为HRL开发的工具，可以识别LRL的基本韵律特征，如措辞和强调？对英语的初步研究表明，通过使用根据音高变化和清晰度等特征选择的数据子集，可以创建更自然和更容易理解的声音。这些方法将在土耳其语、阿姆哈拉语和泰卢固语等LRL上进行测试。将在可懂度和自然度方面进行评估，包括自动评估和使用众包技术与每种语言的母语者进行评估。这项探索性工作的最终目标将是测试这些技术在各种各样的LRL已收集的目的，开发语音识别器。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Characteristics of Text-to-Speech and Other Corpora

DOI：
10.21437/speechprosody.2018-140
发表时间：
2018-06
期刊：
Speech Prosody 2018
影响因子：
0
作者：
Erica Cooper;E. Li;Julia Hirschberg
通讯作者：
Erica Cooper;E. Li;Julia Hirschberg

Adaptation and Frontend Features to Improve Naturalness in Found-Data Synthesis

适应和前端功能可提高发现数据合成的自然度

DOI：
10.21437/speechprosody.2018-160
发表时间：
2018
期刊：
Speech Prosody 2018
影响因子：
0
作者：
Cooper, Erica;Hirschberg, Julia
通讯作者：
Hirschberg, Julia

Prosody Prediction from Syntactic, Lexical, and Word Embedding Features

DOI：
10.21437/ssw.2019-48
发表时间：
2019-09
期刊：
10th ISCA Workshop on Speech Synthesis (SSW 10)
影响因子：
0
作者：
Rose Sloan;S. S. Akhtar-S.;Bryan Li;Ritvik Shrivastava;Agustin Gravano;Julia Hirschberg
通讯作者：
Rose Sloan;S. S. Akhtar-S.;Bryan Li;Ritvik Shrivastava;Agustin Gravano;Julia Hirschberg

A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis

文本转语音合成中基于说话者和基于话语的数据选择的比较

DOI：
10.21437/interspeech.2018-1313
发表时间：
2018
期刊：
Interspeech 2018
影响因子：
0
作者：
Kai-Zhan Lee, Erica Cooper
通讯作者：
Kai-Zhan Lee, Erica Cooper

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data

用于优化基于 ASR 数据训练的 TTS 语音清晰度的话语选择

DOI：
10.21437/interspeech.2017-465
发表时间：
2017
期刊：
Interspeech 2017
影响因子：
0
作者：
Erica Cooper, Xinyue Wang
通讯作者：
Erica Cooper, Xinyue Wang

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Julia Hirschberg其他文献

A Novel Methodology for Developing Automatic Harassment Classifiers for Twitter

一种为 Twitter 开发自动骚扰分类器的新方法

DOI：
10.18653/v1/2020.alw-1.2
发表时间：
2020
期刊：
Workshop on Abusive Language Online
影响因子：
0
作者：
Ishaan Arora;Julia Guo;Sarah Ita Levitan;Susan E Mcgregor;Julia Hirschberg
通讯作者：
Julia Hirschberg

Detecting Inappropriate Clarification Requests in Spoken Dialogue Systems

检测口语对话系统中不适当的澄清请求

DOI：
10.3115/v1/w14-4331
发表时间：
2014
期刊：
影响因子：
0
作者：
Alex Liu;Rose Sloan;M. Then;Svetlana Stoyanchev;Julia Hirschberg;Elizabeth Shriberg
通讯作者：
Elizabeth Shriberg