权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

自然なヒューマンコンピュータインタラクションのための話し言葉会話音声合成

自然人机交互的口语对话语音合成

基本信息

批准号：
13J08776
负责人：
郡山知樹
金额：
$ 1.47万
依托单位：
Tokyo Institute of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2013
资助国家：
日本
起止时间：
2013-04-01 至 2015-03-31
项目状态：
已结题

项目摘要

今日まで音声合成で広く研究の対象となっていたものは朗読調・アナウンス調の音声であった。近年の研究により、比較的低コストで楽しげや怒りといった感情表現・発話様式を再現することが可能であることが報告されているが、日常会話で用いられるような話し言葉調の自然な音声を合成できるまでには至っていない。その理由として、自発的な会話音声に含まれる疑問や確認などの発話意図や「ああ」「うん」などのフィラーといった多様な表現を実現するための、データベースの構築・音声の説明変数の選択・モデル化手法に対して十分な検討が行われていないという問題点がある。研究代表者は隠れマルコフモデルに基づく音声合成（HMM音声合成）を話し言葉調の音声に適用する手法として、従来手法である音素単位のモデル化に対し、疑問に含まれる上昇調などの韻律的イベントを単位とするモデル化手法を提案した。しかし、HMM音声合成における状態単位のモデル化という制約から自然な話し言葉会話音声の生成には至らなかった。そこで、本研究ではHMMのように状態単位ではなくフレーム単位で音声をモデル化するガウス過程回帰に基づく新たな音声合成手法（GPR音声合成）を提案した。読み上げ調の音声に対し、音声の音韻を表すスペクトルおよび韻律を表すF0のモデル化を行い、従来のHMM音声合成に比べ自然性の高い音声を合成できることを示した。GPR音声合成は柔軟性の高い手法であり、話し言葉に特有の入力変数の導入が容易であることから、今後話し言葉会話音声における自然性の向上に繋がると考えられる。

Today's sound synthesis is a study of the image and sound of the sound. In recent years, research has been conducted to compare the low level of emotional expression, speech pattern reproduction, and daily conversation. The reason for this is that the self-generated voice contains questions, confirms the meaning of the voice, and implements the structure of the voice. The number of voice descriptions is selected. The method of conversion is very difficult to solve. Research representatives proposed methods for the application of HMM sound synthesis to speech tone and phoneme units, including the application of HMM sound synthesis to phoneme units, and the application of HMM sound synthesis to phoneme units, phoneme units, and phoneme units. HMM sound synthesis status status In this study, we propose a new sound-to-sound synthesis method (GPR sound-to-sound synthesis). The sound of the upper tone, the sound of the sound GPR sound synthesis is soft and easy to import, and the natural sound synthesis is easy to import.

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Statistical nonparametric speech synthesis using sparse Gaussian processes

DOI：
10.21437/interspeech.2013-121
发表时间：
2013
期刊：
影响因子：
0
作者：
Tomoki Koriyama;Takashi Nose;Takao Kobayashi
通讯作者：
Tomoki Koriyama;Takashi Nose;Takao Kobayashi

Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization

基于使用全局方差和超参数优化的高斯过程回归的参数语音合成

DOI：
10.1109/icassp.2014.6854319
发表时间：
2014
期刊：
Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing
影响因子：
0
作者：
Tomoki Koriyama;Takashi Nose;Takao Kobayashi
通讯作者：
Takao Kobayashi

スパース近似と畳み込みカーネルを用いたガウス過程回帰に基づく音声合成

使用稀疏近似和卷积核的基于高斯过程回归的语音合成

DOI：
发表时间：
2013
期刊：
日本音響学会2013年秋季研究発表会講演論文集
影响因子：
0
作者：
郡山知樹;能勢隆;小林隆夫
通讯作者：
小林隆夫

Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis

DOI：
10.1109/icassp.2015.7178908
发表时间：
2015-04
期刊：
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Tomoki Koriyama;Takao Kobayashi
通讯作者：
Tomoki Koriyama;Takao Kobayashi

Statistical Parametric Speech Synthesis Based on Gaussian Process Regression