权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Autonomous Reinforcement Learning - From scratch to less and less structure

可扩展的自主强化学习——从头开始到越来越少的结构

基本信息

批准号：
260194412
负责人：
Professor Dr. Joschka Bödecker, since 4/2015
金额：
--
依托单位：
Institut für Informatik (IIF)
依托单位国家：
德国
项目类别：
Priority Programmes
财政年份：
2014
资助国家：
德国
起止时间：
2013-12-31 至 2020-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/260194412?language=en
关键词：
Scalable Autonomous Reinforcement Learning scratch

项目摘要

Over the course of the last decade, the framework of reinforcement learning (RL) has developed into a promising tool for learning a large variety of different tasks in robotics. During this timeframe, a lot of progress has been made towards scaling reinforcement learning to high-dimensional systems and solving tasks of increasing complexity. Unfortunately, this scalability has been achieved by using expert knowledge to pre-structure the learning problem in several dimensions. As a consequence, the state-of-the-art methods in robot reinforcement learning generally depend on hand-crafted state representations, pre-structured parametrized policies, well-shaped reward functions and demonstrations by a human expert to aid scaling of the learning algorithm.In this proposal, we want to advance the field by starting with a 'classical' reinforcement learning setting for a challenging robotic task (i.e., tetherball). Solving this task by RL methods will be already a valuable contribution. From there on, we will start to identify the components for which the learning task design still needs engineering experience. In the course of this proposal, we show how we aim to drive each of these components towards more autonomy while developing highly scalable approaches.To this end, we will develop systematic methods to increase the autonomy of the learning system by going beyond traditional approaches: (1) proposing methods for learning state representations for reinforcement learning automatically; (2) developing generic policy classes capable of representing the large variety of control policies that are necessary for truly autonomous behavior; (3) discovering informative reward functions autonomously. Progress in each of these aspects will lift the learning algorithm to a higher level of autonomy. The advances will be grounded in the well established theoretical framework of policy search and enabled through improvements to state-of-the-art reinforcement learning algorithms. Ultimately the resulting system should learn how to map raw sensory inputs to raw control signals from simple, generic principles, discovering structure within its environment automatically and solving difficult control tasks without expert knowledge. If successful, both the complete methodology developed within this project as well as sub-parts of it will help to establish a new, substantially more powerful generation of reinforcement learning algorithms that are capable of solving complicated robot control problems autonomously.

在过去的十年中，强化学习（RL）框架已经发展成为一种有前途的工具，用于学习机器人技术中的各种不同任务。在此期间，在将强化学习扩展到高维系统和解决日益复杂的任务方面取得了很大进展。不幸的是，这种可扩展性是通过使用专业知识在多个维度上预先构建学习问题来实现的。因此，机器人强化学习中最先进的方法通常依赖于手工制作的状态表示、预先结构化的参数化策略、良好的奖励函数以及人类专家的演示，以帮助扩展学习算法。在本提案中，我们希望通过针对具有挑战性的机器人任务（即绳球）的“经典”强化学习设置来推进该领域的发展。通过强化学习方法解决这项任务已经是一个有价值的贡献。从那时起，我们将开始确定学习任务设计仍需要工程经验的组件。在本提案的过程中，我们展示了如何在开发高度可扩展的方法的同时，推动每个组件实现更大的自主性。为此，我们将超越传统方法，开发系统方法来增加学习系统的自主性：（1）提出自动学习强化学习状态表示的方法； (2) 开发能够代表真正自主行为所需的各种控制策略的通用策略类； (3) 自主发现信息奖励函数。这些方面的进步都将把学习算法提升到更高的自主水平。这些进步将基于完善的政策搜索理论框架，并通过改进最先进的强化学习算法来实现。最终，最终的系统应该学习如何根据简单、通用的原理将原始感官输入映射到原始控制信号，自动发现其环境中的结构，并在没有专业知识的情况下解决困难的控制任务。如果成功，该项目中开发的完整方法及其子部分将有助于建立新一代、更强大的强化学习算法，能够自主解决复杂的机器人控制问题。

项目成果

期刊论文数量（5）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Manifold-based multi-objective policy search with sample reuse

DOI：
10.1016/j.neucom.2016.11.094
发表时间：
2017-11
期刊：
Neurocomputing
影响因子：
6
作者：
Simone Parisi;Matteo Pirotta;Jan Peters
通讯作者：
Simone Parisi;Matteo Pirotta;Jan Peters

Reinforcement learning vs human programming in tetherball robot games

DOI：
10.1109/iros.2015.7354296
发表时间：
2015-12
期刊：
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
影响因子：
0
作者：
Simone Parisi;Hany Abdulsamad;A. Paraschos;Christian Daniel;Jan Peters
通讯作者：
Simone Parisi;Hany Abdulsamad;A. Paraschos;Christian Daniel;Jan Peters

Goal-driven dimensionality reduction for reinforcement learning

DOI：
10.1109/iros.2017.8206334
发表时间：
2017-09
期刊：
2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
影响因子：
0
作者：
Simone Parisi;Simon Ramstedt;Jan Peters
通讯作者：
Simone Parisi;Simon Ramstedt;Jan Peters

Local-utopia policy selection for multi-objective reinforcement learning

多目标强化学习的本地乌托邦策略选择