权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

人間による音声情報処理過程の分析とそれを応用した音声対話インターフェイスの構築

分析人类语音信息处理过程并利用该分析构建语音对话界面

基本信息

批准号：
16016219
负责人：
峯松信明
金额：
$ 7.42万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
2004
资助国家：
日本
起止时间：
2004 至 2005
项目状态：
已结题

项目摘要

音声によって伝搬される情報は,言語情報,パラ言語情報,非言語情報に分類される。従来の音声情報処理パラダイムでは,まずパラ言語情報を音声の物理現象から分離し,言語情報+非言語情報となった音声に対して,非言語情報の和をとることで(即ち,数千,数万という話者の音声を集めることで)言語情報を抽出する,という方法論を構築してきた。聴覚生理学,脳科学の知見によれば,言語情報と非言語情報は分離して脳で処理されているとのモデルが提案されている。即ち「集めること」なく,両者を分離できる,ということである。本研究では,音声の物理現象から話者,音響機器特性などの静的な非言語情報を分離する枠組みを,脳科学の知見を考慮しつつ提案した。その応用として,一人の話者の音声を用いた不特定話者音声認識や,感情・意図といったパラ言語情報の高精度抽出を実現した。ケプストラムの時系列として表現される音声スペクトルを分布系列へと変換し,任意の二分布間距離と求める。この時,非ユークリッド空間における分布間距離を採用する(バタチャリヤ距離)と,非言語的特徴を表現する数学モデルであるアフィン変換に対して,分布間距離は不変となる。全ての二分布間距離を求めることは,全分布で張られる幾何学構造を規定することに等しいが,距離不変は構造不変の性質をもたらす。これは,構造音韻論と呼ばれる言語学の一分野を数学的,物理学的に解釈することに成功したことを意味する。音的差異(コントラスト)のみに着眼して音声を構造的に表象する方法論に基づいて,音声認識,及び感情・意図推定を検討した。前者については,孤立母音系列という非常に限られたタスクではあるが,四千人以上の音声データを用いた音響モデルよりも高い精度を示し,後者においても,ピッチに関する情報を付与することで従来方法と比較して,より精度の高い方法を実現した。

Voice information, speech information, verbal information, non-verbal information are classified. The speech information processing is divided into two parts: speech information and physical phenomena of speech and sound, speech information and non-speech information, and the sum of non-speech information and speech information. In physiology and science, speech information and non-speech information are separated and processed. That is to say,"set In this study, the physical phenomena of sound and sound, the characteristics of sound machines, the separation of non-verbal information, and scientific knowledge are considered. A person's voice can be extracted with high precision without specific speaker voice recognition, emotion and meaning. The distance between any two distributions can be calculated by changing the time series of the sound distribution. The distance between distributions of non-speech characteristics is not changed when the non-speech characteristics are expressed mathematically. If the distance between two complete distributions is determined, the geometric structure of the complete distribution is defined, and if the distance is constant, the structure is constant. The theory of structural phonology is divided into mathematics and physics. The differences between sound and sound are discussed in the methodological basis of sound structure, sound recognition, and emotion estimation. The former refers to isolated vowel series, which are very limited, and more than 4,000 people are used for audio recording, and the latter refers to information related to sound recording.