权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

人間による音声情報処理過程の分析とそれを応用した音声対話インターフェイスの構築

分析人类语音信息处理过程并利用该分析构建语音对话界面

基本信息

批准号：
15017225
负责人：
峯松信明
金额：
$ 2.62万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
2003
资助国家：
日本
起止时间：
2003 至无数据
项目状态：
已结题

项目摘要

人間間の音声コミュニケーションを観測すると、音声の音響情報から様々なパラ言語情報,非言語情報を抽出することで円滑なコミュニケーションを実現していることが分かる。本研究では,パラ言語情報として発話意図に,また,非言語情報として話し手の知覚的年齢情報に着眼してその自動抽出を検討した。特に発話意図の抽出に関しては,音響音声学に立脚した音声工学とは完全に異なる観点からの音声モデリングを行なった。音声の物理現象の中に,話者・収録環境に依存しない普遍構造が存在することを実証しており,その普遍構造とパラ言語情報との関連について検討した。話者認識技術に基づいてユーザの知覚的年齢の推定を試みた。子供音声・成人・老人音声データベース(合計男女約1000人)に対してその音声聴取時に感じる年齢を,大学生30名を対象として聴取実験によりラベリングさせた。その結果より,データベース話者各々に対して知覚的年齢分布が定義される未知入力話者に対する知覚的年齢推定は,未知話者とデータベース話者との距離を尤度という形で求め,各データベース話者に付随する知覚的年齢分布を,この尤度を用いて期待値化することで推定した。実験の結果,機械による推定値と人間による推定値間の相関は0.9となった。音声ストリームを確率論的に状態系列として捉え,次に相対論的に状態間の関係のみに着眼し(構造化し),その関係を情報論的に定量化する。こうして構造化された音声は性別,年齢,話者,マイク,伝送特性などに一切影響を受けず話し手の脳から聞き手の脳にまで到達する。音響音声学が提供する音声表象は「歪んでいない音声は存在しない」と主張し,本研究で提案する新しい音声の物理表象では「人間が発声する限り音声は歪み得ない」と主張する構造を唯一歪ませるのがパラ言語情報であり,本研究では種々の感情・意図によって構造のサイズがどう変化するのか,及び構造そのものがどう歪むのか,について実験的検討を行ない良好な結果を得ることができた。

The voice of the human world can be detected, and the voice and audio information can be extracted. This study focuses on the automatic extraction of non-verbal information and verbal information. Special speech meaning extraction related to sound acoustics vertical sound engineering completely different point sound separation In the physical phenomena of sound, the speaker and the recording environment depend on the existence of universal structures. The speaker recognizes technology as the basis for the estimation of the age of knowledge. Children, adults and elderly (about 1000 men and women in total), 30 college students, and 30 students As a result, the estimated annual distribution of knowledge among unknown incoming speakers is defined. The estimated annual distribution of knowledge among unknown incoming speakers is defined. The estimated annual distribution of knowledge among unknown incoming speakers is defined. The estimated annual distribution of knowledge among unknown incoming speakers is defined. As a result, the estimated value of the machine is 0.9. The relationship between the states of the acoustic theory and the information theory is quantified. The structure of the voice is gender, age, speaker, transmission characteristics, all affected by the voice, voice, voice. Sound acoustics provides sound representation, and the physical representation of sound is proposed. The structure is unique, and the speech information is proposed. The results of the investigation were good.