权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

センサ信号統合化学習と強化学習の融合に関する研究

传感器信号集成学习与强化学习融合研究

基本信息

批准号：
08233204
负责人：
柴田克成
金额：
$ 1.92万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
1996
资助国家：
日本
起止时间：
1996 至无数据
项目状态：
已结题

项目摘要

本研究では、センサ信号統合化学習と強化学習を融合することにより、視覚センサのように、局所的な受容野しか持たない多数のセンサセルの信号から目的達成のための動作の生成法を学習によって柔軟かつ効率的に獲得することを目指した。強化学習の中で、目的達成までの所要時間を現在の状態から予測することを学習するために、筆者らはニューラルネットを用いて予測(評価)値を計算させ、時間による2階微分値を0に近づけるという時間軸スムージング学習によってそのニューラルネットを学習させてきた。しかし、本研究を進めることにより、複数経路での評価等を考慮すると、予測値の時間の2階微分値を0にするだけでなく、時間変化量を一定化することが必要であることがわかった。さらに、時間変化量を一定にする学習において、現在の予測値を基準に、過去の予測値を学習させるという方法を採る必要があることもわかった。そして、この方法を用いることにより、結果的に、センサ信号の統合化学習を用いなくても、視覚センサ信号を直接強化学習で扱うことができることがわかった。従来、視覚センサ信号を用いて強化学習をさせる場合には、視覚センサ信号を人間が作ったプログラムにより前処理し、複数の離散な状態空間に分割して、各状態に対する動作を学習させてきた。従って、適応性という点で問題があった。しかし、本方法を用いることにより、単純な問題の場合には、視覚センサ信号を直接入力しても学習できることがわかった。そして、その際にニューラルネットの中間層ニューロンが、局所的な受容野しか持たないセンサの信号を統合し、空間情報を効率的に表現していることがわかった。また、システムの動作特性を変えてシミュレーションすることにより、中間層ニューロンが、学習に必要な部分を拡大して表現するといった適応能力があることがわかった。

This research is based on signal integrated learning and reinforcement learning integration, visual signal integration learning and reinforcement learning, visual signal integration learning, and the bureau's acceptance of the field.い Many のセンサセルの Signal から Purpose achieved のための Action の Generating method を Learning によって Soft かつに Gain することを Eye finger した. The middle part of reinforcement learning, the time required to achieve the goal, the current state, and the predictionことを学するために、The author らはニューラルネットを用いて Pre-test (review価) Value calculationさせ、Timeによる2nd order differential valueを0にNearlyづけるというTime Axis スムージングlearn によってそのニューラルネットをlearn させてきた.しかし、This research progressめることにより、Plural 経路でのvaluation価etc.をConsiderationすると、Predicted value のTimeの2 The order differential value を0にするだけでなく, the time changing quantity をa certain することがnecessary であることがわかった.さらに、The amount of time change is certain にするLearning において、The current estimated value をbaselineに、 In the past, it is necessary to learn the method of learning the させるという.そして, この method を use いることにより, result に, センサ signal のintegrated learning を useいなくても、See 覚センサ Signal をDirect reinforcement learning でうことができることがわかった. Come on, see the signal of the world, use the reinforcement learning of the occasion, see the signal of the world, and do the work of the world. Pre-processing of the data, segmentation of the complex discrete state space, and learning of each state's actions. There is a problem with the adaptability and adaptability.しかし、This method is used in いることにより、Simple problem and situation には、 According to the signal of 覚センサ, the direct input of the force is the learning function.そして, その记にニューラルネットの中layer ニューロンが, bureau's な Acceptance No. しか木The integration of signals and space intelligence and the efficiency of space intelligence are demonstrated.また, システムのaction characteristics を変えてシミュレーションすることにより, middle layer ニューロンが, learning is necessary, part of it is necessary, performance is good, and ability is suitable.