权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Everyday conversation speech synthesis

日常对话语音合成

基本信息

批准号：
22K12107
负责人：
森大毅
金额：
$ 2.58万
依托单位：
Utsunomiya University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2022
资助国家：
日本
起止时间：
2022-04-01 至 2025-03-31
项目状态：
未结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22K12107/
关键词：
自発音声会話音声会話音声合成韻律

项目摘要

本研究の目的は、日本語日常会話コーパス(CEJC)を利用した高品質な会話音声の合成である。End-to-end音声合成をCEJCのような録音品質が悪いコーパスに適用すると、その悪い音をそのままモデル化してしまう。本研究では、CEJCを韻律モデルの学習のみに使用しつつ、別の高品質な音声コーパスを併用してスペクトルモデルを学習することで、読み上げ音声の合成と同等な品質を保ちながら会話音声の韻律を有する音声合成を目指している。令和4年度は、end-to-end音声合成による韻律とスペクトルの重層モデリングの検討を前倒しして実施した。独立したニューラルfoモデルの導入に先立って、まず FastSpeech 2 のvariance adaptor (fo, 強度、継続時間予測器) の個別学習を試みた。前処理として、CEJCは録音レベルの統制が取れていないため、セッション単位で平均強度を一致させるような振幅正規化を行った。また、電話音声は発声方法がかなり異なることがわかったので最初の検討からは除外した。FastSpeech 2を単純にCEJCで学習したモデルから得られた合成音声は、我々が日常的に発する発話の韻律的特徴をよく反映することがわかったので、このモデルを初期状態とし、variance adaptorの重み更新を停止しつつ別の高品質な音声コーパスを用いてファインチューニングする方法を検討した。このようにして得られた合成音声は、現在のところ、品質の点でも韻律の点でも予想されるようなものとはなっておらず、原因の究明が必要である。さらに、令和4年度はaffect burst合成に関する検討として、叫び声の合成ならびにspeech laughの音声学的検討を行った。

The purpose of this study is to synthesize high-quality Japanese conversational sounds (CEJC). End-to-end sound synthesis CEJC's recording quality is applicable to all sound systems. This study aims at the use of CEJC prosody in learning, the combination of different high-quality sound and sound, and the synthesis of sound and sound with the same quality. In the fourth year of the year, the end-to-end sound synthesis was carried out. Independent learning of FastSpeech 2 variation adaptor (fo, intensity, time predictor) Pre-processing, CEJC recording, control, average intensity, amplitude normalization The method of telephone voice transmission is different except for the initial discussion. FastSpeech 2: Pure CEJC, Learning, Learning, Synthesis, Voice, Voice The sound of the synthesized sound is opposite, the sound of the present is opposite, the sound of the rhythm is opposite, the sound of the quality is opposite, the sound of the rhythm is opposite. In 2010, the Ministry of Finance and the Ministry of Finance conducted research on the synthesis of sound and speech.