权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

提出基于结构不变性定理的语音可供性并构建基于其的语音识别系统

基本信息

批准号：
18049018
负责人：
峯松信明
金额：
$ 2.11万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
2006
资助国家：
日本
起止时间：
2006 至无数据
项目状态：
已结题

项目摘要

音声から言語情報・パラ言語情報を抽出する場合,年齢/性別/収録聞きの違いによって付与される音響歪みは純粋なノイズとなる。従来これらのノイズに対処するために,多量の音声でイータを収集し,それらから統計的な音響モデルを構築していた。本研究では,集めることで解決を図るのではなく,これらのノイズを表現する次元を消失した音声モデリング(音声アフォーダンスを数学的に定式化することで解決を図った。音声ストリームを分布系列に変換し,時間的に離れた分布対を含め、全ての二分布距離をバタチャリヤ距離と呼ばれる距離尺度を用いて計算する。全ての2事象間距離を求める(即ち距離行列を算出する)ことは,幾何学的な構造を規定することに等しいが,距離尺度としてバタチャリヤ距離を用いることで,空間を歪ませて構造不変性を保証している。先行研究では,孤立母音の系列を対象として上記音声表象の妥当性を検討したが,本年度はこれを連続音声へと拡張して検討を行なった。この場合,状態数の増加に伴う問題が発生するが,構造不変性を部分空間においても仮定することで認識率の大幅な向上を実現した。具体的には,日本語5母音を並び替えて構成される120単語認識をタスクとして実験を行ったところ,単語単位では93%,母音単位では97%という率が得られた。これは,音声の絶対的な物理量を一切用いずに,単語が認識でき,かつ,母音を同定することが可能であることを示す。従来,音の同定には音の絶対的な特徴量を用いて来たが(故に,音響歪みが混入する),これとは全く異なる枠組みにおいて,音声の認識が可能であることを示している。この場合,モデル学習に必要な話者数は極めて少数でよい。なお,本手法は孤立音の同定は原理上できなくなる。つまり,音の同定を行なうことなく,単語の同定を行なうアルゴリズムとなる訳だが,似た症状を呈する障害として発達性dyslexiaがある文字の読み書きにのみ困難を示す症状である。本研究は,この症状を物理的に説明するモデルを提供する可能性があり,言語障害関係の学会において様々な議論を重ねることができた。

In the case of extracting speech information, the age/gender/recording information is assigned to the audio information. In the past, the number of sound waves was increased, and the number of sound waves was increased. This study sets up a set of mathematical formulas for solving the problem. The distance between the sound and the sound is calculated by the distance between the sound and the sound The distance between two objects is calculated, and the geometric structure is specified. The distance scale is used to ensure the structural invariance of space. The first study is to discuss the appropriateness of the sound image of the isolated vowel series. This year, the sound image of the continuous vowel is discussed. In this case, the number of states increases with the occurrence of problems, and the structural invariance increases with the occurrence of problems. Specifically, the Japanese 5 vowels All physical quantities of sound and sound are used in the same way. The sound of the same sound is mixed with the sound of the opposite sound. In this case, the number of people who need to talk about learning is extremely small. This technique is based on the principle of isolated sound and fixed sound. The symptoms are presented as obstacles to the development of dyslexia, and the symptoms are presented as difficulties in writing. This study focuses on the physical explanation of the symptoms and the possibility of speech impairment.