权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

評価と動作の並列学習により障害回避を自己形成する自律能動学習機械の研究

通过评估和动作并行学习自我形成避障的自主主动学习机研究

基本信息

批准号：
07780305
负责人：
柴田克成
金额：
$ 0.77万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Encouragement of Young Scientists (A)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至无数据
项目状态：
已结题

项目摘要

本研究は、ニューラルネットを用いた遅延強化学習の問題において、障害回避をいかに学習させるか、多数のセンサ細胞から信号が送られてくる場合にどのように処理するかの大きな2つのテーマに沿って進めてきた。前者については、最初に、ロボットが目標物を捕らえるという問題で、障害物を一般化し、ロボットと目標物との空間的な位置関係によってロボットの動作特性を変化させた場合のシミュレーションを行った。これにより、従来のように、目的の達成にいかに近いかを表す評価関数を、それ自体の時間の2階微分値を0に近づけるという学習させるだけでは、ロボットが目標物までの最適なパスを獲得できないことが判明した。その後の解析から、試行毎に評価関数の時間変化の傾き(1階微分値)が変化し、正しい評価が行えないという状況であることがわかった。そこで、評価関数の1階微分値の時間平均を保持し、1階微分値がその値に近づくように学習を行うという方法を考案し、シミュレーションによって確認した。また、試行錯誤の方法を工夫して障害物回避に利用するという問題については、試行錯誤に用いる乱数の振幅を学習させる方法を試みたが、現在のところまだうまくいっていない。一方、多数のセンサ細胞から信号が得られる場合について、それを統合化し、強化学習に使いやすい形に変換することを学習できないかを試みた。そして、空間の情報が時間的に滑らかであるという仮説から、多数のセンサ信号を入力とするニューラルネットの出力の時間の2階微分値を0に近づけるという学習によって、多数のセンサ信号を統合したアナログ出力を学習によって得ることを提案した。そして、網膜細胞が1次元に30個配列されている状態で、目の前を物体が単振動をしている状況でシミュレーションを行ったところ、外部から教師信号を与えることなく、学習によって、ニューラルネットの出力が物体の位置を表すようになった。

This study focuses on the problem of delayed reinforcement learning, obstacle avoidance, and the processing of most cellular signals. The former refers to the problem of object capture, object generalization, object spatial positional relationship, object motion characteristics, etc. For example, if you want to learn from the second order derivative of the time, you can use it to determine the optimal value of the target. After the analysis, try to evaluate the relevant number of time to change the tilt (1st order differential value), change, positive evaluation, change the situation. The time average of the first order derivative of the relevant number is maintained, and the first order derivative is determined by the method of study. Try the wrong method, try the wrong method, try the wrong method. One side, the majority of the cells are separated from the other side, and the other side is separated from the other side. The second derivative of the output time of the spatial signal is 0. The second derivative of the output time of the spatial signal is 0. In addition, the retina cells are arranged in 30 dimensions, and the objects are arranged in a state of vibration. The objects are arranged in a state of vibration.