权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

独立于语言、多模式且数据高效的语音合成和翻译方法

基本信息

批准号：
21K11951
负责人：
Cooper Erica
金额：
$ 2.66万
依托单位：
National Institute of Informatics
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2021
资助国家：
日本
起止时间：
2021-04-01 至 2024-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-21K11951/
关键词：
speech synthesis self-supervised learning low-resource languages speech assessment mean opinion score text-to-speech vocoder pruning efficiency multi-lingual machine translation deep learning neural networks

项目摘要

In this second year of the project, we looked at two main topics: language-independent, data-efficient text-to-speech synthesis for low-resource languages using self-supervised speech representations, and automatic mean opinion score prediction.Self-supervised representations for speech have shown remarkable usefulness for many downstream speech-related tasks, and have been shown to contain phonetic information. We therefore chose these as an intermediate representation for text-to-speech synthesis trained on data from many languages, which can then be fine-tuned to a new language using only a small amount of data. This is ongoing work in progress, and we are collaborating with researchers from the National Research Council of Canada and the University of Edinburgh.We have also identified automatic evaluation of synthesized speech as an important topic for low-resource languages, since finding listeners to participate in listening tests can be especially difficult for these languages. In collaboration with Nagoya University and Academia Sinica, we co-organized the first VoiceMOS Challenge, a shared task for automatic mean opinion score (MOS) prediction for synthesized speech. The challenge attracted 22 participating teams from academia and industry, and we ran a special session about the challenge at Interspeech 2022. This challenge has advanced the field by generating a great deal of interest in this topic.

在这个项目的第二年，我们研究了两个主要主题：使用自监督语音表示的低资源语言的独立的、数据高效的文本到语音合成，以及自动平均意见得分预测。语音的自监督表示在许多与语音相关的下游任务中显示出显著的实用性，并且已被证明包含语音信息。因此，我们选择这些作为文本到语音合成的中间表示，这些合成是在许多语言的数据上训练的，然后可以仅使用少量数据对新语言进行微调。这项工作正在进行中，我们正在与加拿大国家研究委员会和爱丁堡大学的研究人员合作。我们还将合成语音的自动评估确定为低资源语言的一个重要主题，因为寻找听众参与听力测试对于这些语言来说尤其困难。我们与名古屋大学和中央研究院合作，共同举办了第一届VoiceMOS挑战赛，这是一项针对合成语音的自动平均意见评分（MOS）预测的共享任务。这个挑战吸引了来自学术界和工业界的22个参赛团队，我们在Interspeech 2022上举办了一个关于这个挑战的特别会议。这一挑战引起了人们对这一主题的极大兴趣，从而推动了这一领域的发展。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

University of Edinburgh(英国)

爱丁堡大学（英国）

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

DOI：
10.1109/icassp43922.2022.9747728
发表时间：
2021-10
期刊：
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Cheng-I Lai;Erica Cooper;Yang Zhang;Shiyu Chang;Kaizhi Qian;Yiyuan Liao;Yung-Sung Chuang;Alexander H. Liu;J. Yamagishi;David Cox;James R. Glass
通讯作者：
Cheng-I Lai;Erica Cooper;Yang Zhang;Shiyu Chang;Kaizhi Qian;Yiyuan Liao;Yung-Sung Chuang;Alexander H. Liu;J. Yamagishi;David Cox;James R. Glass

The VoiceMOS Challenge 2022

2022 年 VoiceMOS 挑战赛

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Wen-Chin Huang;Erica Cooper;Yu Tsao;Hsin-Min Wang;Tomoki Toda;Junichi Yamagishi
通讯作者：
Junichi Yamagishi

The VoiceMOS Challenge 2022 website

VoiceMOS 挑战 2022 网站

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

Generalization Ability of MOS Prediction Networks

MOS预测网络的泛化能力

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Erica Cooper;Wen-Chin Huang;Tomoki Toda;Junichi Yamagishi
通讯作者：
Junichi Yamagishi

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Cooper Erica其他文献

そのエージェントの声、合っていますか？-声質変換技術と印象適合・人工感制御-

代理人的声音是否正确？

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Zhang Lin;Wang Xin;Cooper Erica;Yamagishi Junichi;齋藤大輔
通讯作者：
齋藤大輔

Multi-task Learning in Utterance-level and Segmental-level Spoof Detection

话语级和段级欺骗检测中的多任务学习

DOI：
10.21437/asvspoof.2021-2
发表时间：
2021
期刊：
Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge
影响因子：
0
作者：
Zhang Lin;Wang Xin;Cooper Erica;Yamagishi Junichi
通讯作者：
Yamagishi Junichi