权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

実世界環境下における遠隔発話の音声認識と話者認識およびインデックス化に関する研究

现实环境中的语音识别、说话人识别和远程语音索引研究

基本信息

批准号：
19650040
负责人：
中川聖一
金额：
$ 2.05万
依托单位：
Toyohashi University of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Challenging Exploratory Research
财政年份：
2007
资助国家：
日本
起止时间：
2007 至 2009
项目状态：
已结题

项目摘要

遠隔発話の音声認識に関しては、H20年度とH21年度に開発した話者の位置と発声方向の同定方法を用いた認識手法を開発した。つまり、音源位置の同定に基づいて、マイクロフォンアレイのビームフォーマーによって音声を強調し、発声方向の向きの同定によって、発声語彙を推定・制限する方法により認識率を高めた。さらに、残響補正の基本的な手法であるケプストラム平均正規化法を、短時間の発声によりオンラインで適用できる技術を開発した。これは、混合ガウス分布(GMM)モデルにより音声をモデル化しておき、入力音声の各フレームをGMMの要素に対応付け、その要素ごとにあらかじめ学習しておいたケプストラム平均正規化量を用いて正規化するもので、従来手法なら数単語の発声時間長を要していたものが、1単語の発声でも正規化の効果が確認できた。遠隔発話の話者認識に関しては、マイクロフォンアレイによる音声強調をした音声に対して、H20年度とH21年度に開発したスペクトル情報(MFCC)と位相情報の併用法を用いた認識手法を開発した。インデックス化に関しては、音声認識と話者認識結果の後処理として、認識結果からの場所とか人名、組織名などの固有名の抽出方法を開発した。テキスト入力ではかなり精度良く固有名を抽出できたが、遠隔発話の音声認識が非常に困難なため、満足のいく結果は得られなかった。

The sound recognition of distant speech is related to H20 and H21, and the position and direction of speech are determined by the same recognition method. The recognition rate is high. The recognition rate is high. In addition, the basic method of residual sound correction is developed by the average normalization method and the application technology of short-time sound correction. This is a mixed sound distribution (GMM). Each element of the input sound is related to the GMM. Each element of the input The method of using the information of the distant speaker is developed. A method for extracting the proper name of an organization from a place is developed. It is very difficult to understand the sound of distant speech.