权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

人間の感覚と整合する音声特徴空間の構築

符合人类感官的音频特征空间构建

基本信息

批准号：
22K19793
负责人：
北岡教英
金额：
$ 4.08万
依托单位：
Toyohashi University of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Challenging Research (Exploratory)
财政年份：
2022
资助国家：
日本
起止时间：
2022-06-30 至 2025-03-31
项目状态：
未结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22K19793/
关键词：
音声特徴空間音声合成音声認識話者埋め込み音声特徴量距離

项目摘要

「人間の感覚と整合する音声特徴空間の構築」を目的とする。音声合成で感情を付与するとき、「平静」と「喜び」は付与可能だが「少しの喜び」はこれらの内挿で実現できない。音声認識で「若年層」と「高齢者層」のデータを用いてこれらの音声認識性能は向上できるが「中年層」の性能は向上できない。この目的を達するために、まず、「2話者の中間音声を合成する音声合成器」の構築を開始した。具体的には、複数話者の音声を話者埋め込みを与えることで実現できるマルチスピーカー音声合成器をTacotron 2に基づいて構築した。そして、その出力音声を、対象とする2話者を識別する話者識別機にかけ、その結果が2話者同等となるようなロス（すなわち両者の確率が0.5となる場合とのクロスエントロピー）を定義する。話者識別の特徴空間は、人間の聴覚の感覚に近いとされるメルスペクトル空間とする。さらに、音声の内容を保持することを保証するために、音声を音声認識器にも入力し、合成音声の認識結果を出力して、合成しようとした正しいテキストと比較した際の誤認識がロスとなるようにする。これらのロスを逆伝搬することで、クリアでかつ2話者両方に同等に近い音声を合成することを試みる。このシステムがほぼ完成したので、今後これを評価する。

The purpose of this paper is to integrate human perception into acoustic feature space. Sound synthesis, emotion, calmness, happiness, possibility, happiness, inner reality. Sound recognition performance of "young layer" and "high layer" is upward. To achieve this goal, the construction of a sound synthesizer for the intermediate sound of two speakers began. The concrete structure of Tacotron 2 is composed of two parts: one part is composed of two parts, the other part is composed of three parts. For example, if the output of the voice and the image are equal to each other, the speaker identification machine and the result are equal to each other, and the accuracy rate of the speaker is 0.5, the speaker identification machine and the result are equal to each other. Speaker recognition feature space, human perception of the near middle of the space. The content of sound is maintained. The two speakers are equally close to each other. This is the first time I've ever been to a hotel.