权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Realization of average-voice-based speech synthesis with diverse voices and speaking styles

实现基于平均语音的多种语音和说话风格的语音合成

基本信息

批准号：
15300055
负责人：
KOBAYASHI Takao
金额：
$ 5.76万
依托单位：
Tokyo Institute of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2003
资助国家：
日本
起止时间：
2003 至 2005
项目状态：
已结题

项目摘要

The purpose of this research is the realization of text-to-speech synthesis that can generate speech with an arbitrarily given speaker's voice and diverse speaking styles and/or emotional expressions. We have obtained the following results.1. Speech synthesis with arbitrary speaker's voice based on average voice modelWe have proposed a new training method of average voice model for speech synthesis in which an arbitrary speaker's voice is generated based on speaker adaptation. We have also proposed new speaker adaptation techniques based on hidden semi-Markov model (HSMM) that can model phone duration more precisely than the conventional hidden Markov model (HMM). From the results of objective and subjective evaluation tests, it has been shown that the average-voice-model-based speech synthesis can generates natural sounding speech of the target speaker.2. Speech synthesis with various speaking styles and emotional expressionsWe have proposed several approaches to the realization of emotional expressivity and speaking style variability in text-to-speech synthesis. We investigated two methods for modeling speaking styles and/or emotional expressions based on an HMM-based speech synthesis framework, and then proposed some approaches to adding various styles to synthetic speech, such as style interpolation, style morphing, style adaptation, and style control techniques. From results of subjective experiments, we have shown that the effectiveness of the proposed approaches.3. ProsodyWe have developed a robust fundamental frequency estimation and voice/unvoiced determination technique based on instantaneous frequency amplitude spectrum. We have also proposed modeling techniques for phone duration and pause for high quality text-to-speech synthesis.

本研究的目的是实现文本到语音合成，可以生成语音与任意给定的扬声器的声音和不同的说话风格和/或情感表达。我们得到了以下结果。1.基于平均语音模型的任意说话人语音合成我们提出了一种新的平均语音模型训练方法，该方法基于说话人自适应生成任意说话人语音。我们还提出了新的说话人自适应技术的基础上隐藏的半马尔可夫模型（HSMM），可以更精确地模拟电话持续时间比传统的隐马尔可夫模型（HMM）。客观和主观评价测试结果表明，基于平均声模型的语音合成方法能够生成自然的目标说话人语音.多种说话风格和情感表达的语音合成本文提出了几种实现文本到语音合成中情感表达和说话风格可变性的方法。我们研究了两种基于HMM的语音合成框架的说话风格和/或情感表达建模方法，然后提出了一些方法来添加各种风格的合成语音，如风格插值，风格变形，风格自适应和风格控制技术。从主观实验的结果，我们已经证明了所提出的方法的有效性.韵律我们发展了一种基于瞬时频率幅度谱的稳健基频估计和清音/浊音确定技术。我们还提出了高质量的文本到语音合成的电话持续时间和暂停的建模技术。

项目成果

期刊论文数量（147）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

平均声に基づく音声合成のための話者適応アルゴリズムの検討

基于平均语音的语音合成说话人自适应算法研究

DOI：
发表时间：
2005
期刊：
日本音響学会2005年秋季研究発表会講演論文集 1-Q-11
影响因子：
0
作者：
中野雄資;緒方克海;磯貝朱里;山岸順一;小林隆夫
通讯作者：
小林隆夫

隠れセミマルコフモデルに基づく適応学習アルゴリズム

基于隐半马尔可夫模型的自适应学习算法

DOI：
发表时间：
2005
期刊：
日本音響学会2005年春季研究発表会講演論文集 (発表予定)
影响因子：
0
作者：
磯貝朱里;山岸順一;小林隆夫;山岸順一;山岸順一;山岸順一
通讯作者：
山岸順一

MLLR adaptation for hidden semi-Markov model based speech synthesis

基于隐半马尔可夫模型的语音合成的 MLLR 自适应

DOI：
发表时间：
2004
期刊：
Proc. the 8th International Conference on Spoken Language 2004
影响因子：
0
作者：
磯貝朱里;山岸順一;小林隆夫;山岸順一;山岸順一;山岸順一;橘誠;磯貝朱里;野村大輔;山岸順一
通讯作者：
山岸順一

隠れセミマルコフモデルに基づく音声合成システムにおける最尤線形回帰の検討

基于隐半马尔可夫模型的语音合成系统最大似然线性回归研究

DOI：
发表时间：
2004
期刊：
日本音響学会2004年秋季研究発表会講演論文集 I
影响因子：
0
作者：
磯貝朱里;山岸順一;小林隆夫;山岸順一;山岸順一;山岸順一;橘誠;磯貝朱里;野村大輔;山岸順一;山岸順一;山岸順二;山岸順一
通讯作者：
山岸順一

重回帰HSMMを用いた合成音声のスタイル制御

使用多元回归 HSMM 对合成语音进行风格控制

DOI：
发表时间：
2006
期刊：
電子情報通信学会技術研究報告,SP2005-160 105・572
影响因子：
0
作者：
Yoshihide Kato;Tomohiro Ohno;Nobuo Kawaguchi;Makoto Tachibana;Yoshihide Kato;能勢隆
通讯作者：
能勢隆