权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

動画像を用いた視聴覚融合音声認識システム

使用运动图像的视听融合语音识别系统

基本信息

批准号：
07780343
负责人：
荻原昭夫
金额：
$ 0.58万
依托单位：
Osaka Prefecture University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Encouragement of Young Scientists (A)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至无数据
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-07780343/
关键词：
音声認識視聴覚融合センサフュージョン HMM 動画像視聴覚情報

项目摘要

本課題では,動画像を使用した視聴覚融合による人間と計算機との対話システムの実現への第一段階として,比較的発話時間の短い文章(人間から計算機への一方通行)を対象とした視聴覚融合による音声認識システムを実現する事を目的とし,動画像を用いた視聴覚融合音声認識システムに関する研究を行なった.本研究で構築を行なった「フルフレーム画像を対象とした視聴覚融合音声認識システム」では,視聴覚情報の入力手段として “音声同期型動画像入力機能を有しているマルチメディアパソコン" を用いて,フルフレーム(30fps,1秒間当たり30フレーム)の動画像をディジタル形式のデータとして撮影する.その後,この動画像を対象として,HMM(隠れマルコフモデル)に多次元ベクトル量子化を組み合わせた認識モデル上で,特徴抽出処理,視聴覚融合処理,音声認識処理の各処理を行なう.なお,本システムでは,「視覚情報用(動画像用)HMMにより算出された対数尤度」と「聴覚情報用(音声用)HMMにより算出された対数尤度」とを1次結合するというシンプルかつ効果的な手法により視聴覚融合処理を実現している.本システム用いて音声認識実験を行なった結果,・母音発声時の音声認識精度の向上・唇の動きが速いために動画像による認識が困難であった子音に対する効果を確認した.さらに,ニューラルネットワークを利用した視聴覚融合処理方式についても検討を進めており,今後は音声認識システムへの実装を試みる予定である.なお,上述のシステムの構築,および,実験評価の実施に際して,本科学研究費補助金研究により購入した設備備品を使用した.

In this topic, the animation image is used, and the first stage of the animation is the first stage of the human world and the computer. Comparative articles about time and space (human world computer and one side pass) を対Elephant and visual fusion による sound recognition システムをThe purpose of the animation is to recognize the visual and sound fusion of the animation and the research on the subject. This research is constructedを行なった「フルフレームportrait を対Elephant とした视聴覚fused sound recognition システム」では, 视聴覚 intelligence の成法として "The sound synchronized animation image input function is the same as the original one" を用いて,フルフレーム(30fps, 1 second as たり30フレーム)のanimation imageをディジタル formのデータとして影视する.そHMM (隠れマルコフモデル) )Multidimensional ベクトルquantized をgroup み合わせたKnowing モデル上で, special Extraction processing, visual and visual fusion processing, and sound recognition processing. Each processing is performed by HMM.りCalculate the された対numerical degree" と "聴覚information (for voice) HMM によりcalculate the された対numerical degree" とを1st combination するというシンプルかつ effect The technique of fruit is the fusion processing of visual and 覚を実appears している. The original システム uses the いて sound to recognize the いて sound and the なった result is the result, ・The sound recognition of the vowel 発 sounds Accuracy is improved, lips are moving, speed is fast, anime images are difficult to recognize, and animations are difficult to recognizeあった子音に対するeffectをconfirmした.さらに,ニューラルネットワークをUsing the した视聴覚 fusion processing method についても検question を进めており, from now on I will know システムへの実装をtrialみる恧ある.なお,The construction of the above-mentioned のシステムの,および,実験综合価の実事に国际して, the research on this scientific research fee subsidy, the したequipment spare parts purchased, and the use of した.