权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

音声と顔画像の融合解析に基づく対話型人物認識システムに関する研究

基于语音与人脸图像融合分析的交互式人物识别系统研究

基本信息

批准号：
07780379
负责人：
松村雅史
金额：
$ 0.7万
依托单位：
Osaka Electro-Communication University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Encouragement of Young Scientists (A)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至无数据
项目状态：
已结题

项目摘要

本研究では、音声の個人的特徴と発話時の顔の特徴を高精度で抽出し、総合的あるいは選択的に活用し、端末を操作する人物を認識するシステムの開発を研究目的とする。具体的には、複数の視覚センサと音響センサを設置した視聴覚融合センシングシステムの開発,音声生成過程の解析に基づく個人性情報の抽出を研究目的とする。研究成果は以下の通りである。1.視聴覚融合センシングシステムの開発:端末に複数のビデオカメラとマイクロホンを設置した視聴覚融合センシングシステムを開発する。まず、4本のマイクロホンを用いて音源(口唇)位置の推定を行う手法を開発した。本手法は4本のマイクロホンと音源との距離の差に伴う信号の位相差をマイクロホン信号の相互相関関数より推定し、音源位置を同定する。端末から50cm離れた音源位置を誤差2.4cm以内で推定することに成功した。次に音源位置が既知である場合、周囲雑音を含むマイクロホン信号から音源信号を抽出するため、話者依存型整合フィルタを考案し、10名の成人男子により話者照合実験を行い、有効性を明らかにした。2.カラー顔画像による口唇位置の推定:カラー顔画像より口唇の位置を推定する手法を提案した。本手法では口唇が肌の色より赤みがかっている点に着目し、色空間より口唇部の位置を推定する。被験者10名の顔画像を用いて口唇位置推定実験を行い、100%の識別結果が得られた。連続音声中の口唇形状が、顔画像の正規化に有効であることが示された。3.音声生成過程の解析に基づく個人性情報の抽出:磁気共鳴映像法(MRI)により歯冠部を含む音道形状の精密測定に成功し、摩擦子音発声時の音道形状データを得ることができた。また、音声言語の明瞭度を決定する舌-口蓋接触応力の計測、鼻咽喉閉鎖強度の計測に成功した。さらに声道及び鼻腔の音響特性を推定し、実音声の分析結果と一致すること、このような音声生成過程の解析に基づく音声分析により個人識別に有効な特徴量が得られることを示した。

This study では, voice の personal 徴と発 when の yan の, 徴を high precision で drew し, 総あるいは sentaku に transfer し, end を operation する characters を know するシステムの open 発を research purpose とする. Specific には, plural の覚センサと sound センサを set した depending on hearing fusion センシングシステムの発, sounds generation process analytical にの base づく personal intelligence の spare を research purpose とする. The research results are and である. 1. Depending on the hearing fusion センシングシステムの発 : open end に plural のビデオカメラとマイクロホンを set した depending on hearing fusion センシングシステムを open 発する. Youdaoplaceholder0, 4 copies of <s:1> ロホロホロホをを using the position of the て sound source (lips) <s:1> to infer that を is in line う, を is used to publish た. This technique は 4 this のマイクロホンと audio とのの distance difference に with う signal の bit differ をマイクロホン signal の mutual phase masato masato number より presumption し with fixed position, audio をする. The end ら is 50cm away from the れた sound source position を with an error of less than 2.4cm で. It is presumed that the するとにとにとに is successful and the た is た. Time に sound source location が already know である occasions, zhou 囲を雑 notes contain むマイクロホン signal から audio signal を spare するため, words dependent type integrated フィルタしを test case, 10 の adult man により according to close the speaker's words be 験を line い, have sharper を Ming らかにした. 2. Youdaoplaceholder0 カラ facial portrait による lip position <s:1> presumption :カラカラ facial portrait よ large lip <s:1> position を presumption する technique を proposal たた. This technique では oral が muscle の color より red みがかっていにる point with よし, color space り oral department の position を presumption する. The <s:1> facial portraits of 10 subjects are を. The position of the lips of the をて is used to infer that the actual subjects are を and を. The 100% <s:1> identification result が is られた. In the 続 sound, the shape of the lip of the mouth が and the shape of the facial image <e:1> are regularized に, which has the effect of であるされたとがとが indicating された. 3. Sounds generation process analytical にの base づく personal intelligence の drew: method of 気 magnetic resonance image (MRI) により歯 crown を containing のむ sound way shape precision measurement にし success, friction consonants sound 発の way shape データを have ることができた. Youdaoplaceholder0 and the <s:1> clarity of speech を determine the success of する glossy-oral cover contact 応 force <s:1> measurement and nasopharyngeal occlusion intensity <s:1> measurement にたたた. さらに track and びの nasal cavity acoustics characteristic を presumption し, be sounds の analysis results agree とすること, このような sounds generation process analytical にの base づく sounds analysis により personal identification に have sharper な, が徴 quantity to られることを shown した.