权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Coordination of Multiple Behaviors for Competition Robots by Vision-Based Reinforcement Learning

基于视觉的强化学习协调竞赛机器人的多种行为

基本信息

批准号：
07455112
负责人：
ASADA Minoru
金额：
$ 4.8万
依托单位：
Osaka University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至 1996
项目状态：
已结题

项目摘要

Coordination of multiple behaviors independently obtained by the reinforcement learning method is one of the issues in order for the method to be scaled to larger and more complex robot learning tasks. Direct combination of all the state spaces for individual modules (subtasks) needs enormous learning time, and it causes hidden states. In this project, we propsed a method which accomplished a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by vision-based reinforcement learning in the first year, and modified the method by introducing modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance in the second year.The first year :1.Individual behaviors which achieve the corresponding subtasks were independently acquired by Q-learning.2.Three kinds of coordinations of multiple behaviors were considered ; simple summation of different action-value functions, switching action-value functions a … More ccording to situations, and learning with previously obtained action-value funcions as initial values of a new action-value function.3.A Task of shooting a ball into the goal avoiding collisions with an opponet was examined. The task can be decomposed into a ball shooting subtask and a collision avoiding subtask.4.As a result, the learing method was the best one in shooting ratio, mean steps to the goal, and avoidance performance.The second year :1.In order to reduce the learing time the whole state space was classified into two categories based on the action values separately obtained by Q- learning : the area where one of the learned behaviors was directly applicable (no more learning area), and the area where learning was necessary due to the competition of multiple behaviors (re-learning area).2.Hidden states are detected by model fitting to the learned action values based on the information criterion.3.The initial action values in the re-learning area were adjusted so that they could be consistent with the values in the no more learning area.4.The method was applied to one to one soccer playing robots, and the validity of the proposed method was shown by computer simulation and real robot experiments. Less

通过强化学习方法独立获得的多个行为的协调是问题之一，以便将该方法扩展到更大和更复杂的机器人学习任务。直接组合各个模块（子任务）的所有状态空间需要大量的学习时间，并且会导致隐藏状态。在本项目中，我们提出了一种方法，该方法在第一年通过协调基于视觉的强化学习获得的多个行为来完成由多个子任务组成的整个任务，并在第二年通过引入模块化学习来修改该方法，该模块化学习考虑到学习时间和性能之间的权衡来协调多个行为。第一年：1.通过Q-学习，个体行为独立地获得，并完成相应的子任务。2.考虑了三种多行为的协调;不同动作值函数的简单求和，切换动作值函数a ...更多信息 3.考察了一个将球射入球门避免与球门碰撞的任务。结果表明，学习方法在投篮率、平均步数和避碰性能方面均优于学习方法第二年：1.为了减少学习时间，根据Q-学习得到的动作值，将整个状态空间分为两类：其中一个学到的行为直接适用的领域（不再学习），以及由于多种行为的竞争而需要学习的区域2.基于信息准则，通过对学习动作值的模型拟合来检测隐藏状态。调整学习区域，使其与不再学习区域的值一致。4.将该方法应用于一对一足球机器人，计算机仿真和真实的机器人实验验证了该方法的有效性。少