权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

音声・言語・画像情報の統合化による概念の獲得に関する研究

整合音频、语言、图像信息的概念获取研究

基本信息

批准号：
03245209
负责人：
中川聖一
金额：
$ 1.28万
依托单位：
Toyohashi University of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
1991
资助国家：
日本
起止时间：
1991 至无数据
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-03245209/
关键词：
音声情報画像情報概念の獲得学習視聴覚情報

项目摘要

本研究では、視覚と聴覚という2つの外的刺激を結びつけて、未知の入力に対する概念の獲得法式を定式化することを目的とし、画像と音声情報により概念を形成するシステムを作成した。本システムでは、この視覚情報(画像)と聴覚情報(音声)を用いて計算機に物の名前や位置等の概念を学習させる。視覚情報としてはカメラで取り込んだ図形画像を入力し、概念形成に必要なパラメ-タを抽出する。今回の実験で形成する概念は、(1)図形の存在性、(2)図形の位置、(3)図形の大きさ、(4)図形の色、(5)図形の形状、の5つのグル-プに分類される。各概念グル-プに対するパラメ-タを抽出する。聴覚情報として、音声から音声情報を抽出する。方法としては、二つの音声の時系列デ-タ同士のDPマッチングを行ない、それによって算出された最適照合パスおよび照合距離により、類似区間を抽出する。次に、文音声とそれに関連する画像の前処理デ-タより音声と画像の対応付けから概念を獲得していくアルゴリズムを開発した。以前、我々が開発した概念獲得アルゴリズムは、画像の特徴パラメ-タの抽出ミスや音声の共通区間の抽出ミスに対してあまり考慮していなかった。また、学習用の音声と画像のペアの入力順序には多少の制限を設けていた。今年度は、これらに対しても概念が獲得できるアルゴリズムを開発した。評価実験として、まず音声の代りに誤りを含んだ文字列、画像の代りに画像特徴パラメ-タを用いてシミュレ-ション実験を行ない、正しく13個の概念(例えば三角形、丸、白い、大きい、左など)が獲得されていくことを確認した。さらに、実際に音声と画像した場合についても評価実験を行なった。音声の共通区間の抽出精度がシステムの性能にも大きく影響することが明らかになった。

In this study, the concept of visual and acoustic information was formulated based on the concept of external stimuli, unknown input forces, and objective information. The concept of the name and position of the object in the computer is studied by using the concept of the visual information (image) and the visual information (sound). The visual information is extracted from the visual image, and the concept is formed. The concepts that have been formed this time are: (1) the existence of the shape of The concept of the right to choose Sound information extraction The method is to calculate the optimal illumination distance and extract the similar interval. Second, the sound and sound of the image pre-processing, the sound and image of the concept of access to the open In the past, we have developed the concept of obtaining a complete set of images, characteristics of images, and extraction of common areas of sound. The input sequence of sound and image for learning is limited by the number of settings. This year, the concept of "anti-corruption" was launched. 13 concepts (e.g. triangle, pill, white center, large center, left center) were identified by this method. In the case of sound and image, the evaluation is carried out. The extraction accuracy of the common range of sound has a great influence on the performance of the sound system.