权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of the adaptive agent for homeostasis and analysis of its cognitive development

体内平衡自适应剂的开发及其认知发展分析

基本信息

批准号：
22KJ0907
负责人：
吉田尚人
金额：
$ 1.09万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2023
资助国家：
日本
起止时间：
2023-03-08 至 2024-03-31
项目状态：
已结题

项目摘要

行動が連続値ベクトルとして定義される四脚ロボット+２資源問題のシミュレーション実験環境を構築した．さらにロボットが身体の内部状態として体温をもつ環境を構築し，異なる2つの実験環境を使った手法の比較を可能とした．当初の目的通り規模の小さい実験系として餌を検出するレーザーセンサをロボット上部に仮定し，深層強化学習の手法であるProximal Policy Optimizationにより行動最適化を行った．これにより恒常性報酬を用いることで上述の2つの環境中でエージェントが生存を持続する行動（歩行・停止・体温制御・ナビゲーション・採餌制御）を創発することを確認した．またこのなかで，内受容感覚を特定の値に固定しても歩行を始めとするエージェントの身体運動は大きな影響を受けないことを発見した．恒常性を実現するとされる他の既存の報酬設定と，本研究で用いた計算神経科学由来の恒常性報酬との比較を上述の2つの環境で行った．これにより恒常性報酬が学習効率の点で優れていることが明らかになった．さらにこの点をReward Shapingの観点から数理的に考察し，恒常性報酬はより根源的な報酬設定に対して価値関数を妥当な形で初期化した場合に対応することを明らかにした．最適化したエージェントの行動解析のため，2種類の餌をエージェント近傍に設置し，上述の発見を利用してエージェントの内受容感覚を様々な値に固定することで栄養状態に応じたエージェントの採餌行動の変化を評価した．以上の知見をもとにレーザーセンサの代わりに画像を入力とするエージェントを畳み込みニューラルネットワークで構成し，RGB画像ならびにRGB-D画像の場合での恒常性エージェントの最適化に成功した．その他，恒常性強化学習に対して特異的に有効な神経ネットワーク構造を発見し全結合構造との性能比較を行った．

Action to establish a sustainable environment for sustainable development The internal state of the body and the temperature of the body are different from each other. The goal is to achieve the goal of small scale in the system and to achieve the goal of deep reinforcement learning. This is a confirmation of the creation of actions (movement, stop, temperature control, request, acquisition control) to sustain survival in the above 2 environments. The internal sensation of the body is fixed at a certain value, and the movement of the body is affected by the movement. This study uses the computational neuroscience to compare the effects of the two environments mentioned above. This is a constant reward for learning efficiency. The point of Reward Shaping is a mathematical investigation of the root causes of constant compensation. The compensation setting is appropriate and the initial situation is clear. Optimization of the action analysis, 2 types of bait near the setting, the above development of the use of the inside of the content of the sensing value fixed, the maintenance of the state of the bait action evaluation. The above results show that the optimization of RGB images and RGB-D images is successful. In addition, constant reinforcement learning is used to compare the performance of specific structures.