权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

High-quality Speech Synthesis based on Accurate Analysis Method and Statistical Method

基于精确分析方法和统计方法的高质量语音合成

基本信息

批准号：
12480079
负责人：
HIROSE Keikichi
金额：
$ 6.4万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2000
资助国家：
日本
起止时间：
2000 至 2002
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-12480079/
关键词：
Statistical Speech Synthesis Terminal Analogue Synthesis Waveform Concatanative Synthesis HMM Speech Syntheses AR-HMM Model Fundamental Frequency Contour Generation Process Model Emotional Speech Synthesis 波形編集合成声帯音源波形モデルフォルマント推定統

项目摘要

The original research plan, which aims at realizing high-quality speech synthesis through utilizing accurate pole-zero information of vocal transfer function for segmental feature generation and applying the functional model constraints for prosodic feature generation, was accomplished with the following results :1. A successive approximation was applied to ARX analysis enabling accurate pole-zero estimation. The method was combined with our formerly developed terminal analogue synthesizer to construct a analysis-synthesis workbench. Using this, we succeeded to improve the quality of liquid sound.2. A speech synthesizer, hybrid of terminal analogue and waveform concatenation, was developed. A high-quality speech synthesis was realized.3. A method was developed for stable formant extraction, which was based on AR-HMM modeling, representing source waveform using HMM. Result of speech synthesis experiment showed that the method could generate high-quality even for a large F0 (fundamental … More frequency) change.4. By adding natural waveform of junction periods in the spectral domain with appropriate weighting to the concatenated speech, we successfully realized a smooth spectral transition. Also we developed a method to effectively reduce the corpus size for concatenative synthesis by the weighted VQ according to the frequency.5. The necessary data size for speaker adaptation was investigated form the viewpoint of speech quality after developing a HMM speech synthesizer. It was shown that a good quality was obtainable 10 and more sentences.6. F0 contour generation was realized by estimating the generation process model parameters using statistical methods. A high speech quality was realized only from a small speech corpus by using linguistic information such as on direct modification relations of words. Also we succeeded to estimate the accent phrase boundaries form text using the same statistical framework. Furthermore, F0 contour generation and phoneme length estimation were realized for emotional speech with a good result.7. A method for automatically estimating F0 contour generation process model commands was realized. Using the method, a prosodic corpus was made. This corpus is indispensable for the above F0 contour generation.8. A rule for controlling mora duration for dialogue-like speech synthesis was constructed. The result of the speech synthesis experiment showed the validity of the rule. Less

本论文完成了原研究计划，即利用人声传递函数精确的零极点信息生成分段特征，应用函数模型约束生成韵律特征，实现高质量的语音合成，取得了以下成果：1.逐次逼近应用于ARX分析，从而实现精确的极点-零点估计。该方法与我们以前开发的终端模拟合成器相结合，构成了一个分析-综合工作台。利用这一点，我们成功地提高了液体声音的质量。研制了一种终端模拟与波形级联相结合的语音合成器。实现了高质量的语音合成.提出了一种基于AR-HMM建模的稳定共振峰提取方法，用HMM表示源波形。语音合成实验结果表明，该方法即使在基频较大的情况下，也能产生高质量的语音 ...更多信息频率）变化。通过在级联语音中加入频谱域中自然的连接周期波形并进行适当的加权，成功地实现了平滑的频谱过渡。提出了一种根据频率加权矢量量化的方法，有效地减少了拼接合成语料的规模.在研制了HMM语音合成器的基础上，从语音质量的角度研究了说话人自适应所需的数据量。结果表明，10次以上的试验可获得良好的质量。F0轮廓线生成是通过统计方法估计生成过程模型参数实现的。通过使用诸如关于词的直接修饰关系的语言信息，仅从小的语音语料库实现高的语音质量。此外，我们成功地估计口音短语边界形式的文本使用相同的统计框架。实现了情感语音的F0轮廓生成和音素长度估计，取得了较好的效果.实现了一种自动估计F0轮廓生成过程模型命令的方法。利用该方法，建立了一个韵律语料库。该语料库对于上述F0等值线生成是必不可少的。构建了类对话语音合成中控制莫拉持续时间的规则。语音合成实验结果表明了该规则的有效性。少