权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Informed Exploration in Reinforcement Learning via Intuitive Physics Model Reasoning

通过直观物理模型推理进行强化学习的知情探索

基本信息

批准号：
516414603
负责人：
Professor Jan Reinhard Peters, Ph.D.
金额：
--
依托单位：
Fachgebiet Intelligente Autonome Systeme
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/516414603?language=en
关键词：
Informed Exploration Reinforcement Learning via

项目摘要

In the near future robots will perform a variety of tasks such as directing tourists or assisting the elderly. The scientific challenges of these applications is to cope with the large diversity of environments and scenarios that our physical world presents. Given this formidable diversity, programming a robot to cover in foresight all possible turns of events is doomed to failure if the robot is not able to adapt and self-improve. To foster robots’ autonomy, task-specific expertise should be accompanied by general purpose learning algorithms that ensure generalization and autonomy. These algorithms should explore and interact with the physical world in a meaningful way to constantly adapt to their changing (sub-)goals. Reinforcement learning (RL) is one such learning framework for an agent to adapt through interaction with its environment. In RL, an agent explores its environment by applying an action, gathers information by observing a state and is incentivized to adapt by collecting rewards. In recent years, RL has become an extremely powerful tool for decision-making as exhibited by its successes at surpassing human level at playing Atari video-games or defeat grand masters in the board game of Go. These agents were all trained on specific simulators designed for each task, and in each case the given simulator represented the entire reality of the agent. Moreover, these agents have been shown to break down when small changes are made in the environment that are considered insignificant to humans. Developing an RL framework that is able to quickly adapt to changes in the environment largely remains an open question, and is the core focus of this project. To improve adaptability of RL to changes in the environment, we propose to study its integration with intuitive physics models (IPMs). IPMs, or what is also referred to as common sense physics, have been a long-standing topic in AI and machine learning but their integration to RL has not been fully realized yet. In the new framework we propose in this project, the goal will be to autonomously discover and learn salient characteristics of the environment, constituting the IPMs, to explore in an informed way by means of reasoning and planning using the IPMs. The main departure from traditional model-based RL is that IPMs will not seek to equally capture all the information of the environment, but rather to learn more abstract and schematic models that will adapt better to changes in the environment. For example, an IPM will predict if a falling glass will break, but will not seek to model the exact position of glass shards after the glass broke. Identifying these key events in the environment, learning models thereof from data, and using these models in RL will be our main research directions. Developing RL algorithms that can generalize to all such task variations will be a major contribution to both the machine learning and robotics communities.

在不久的将来，机器人将执行各种任务，如指导游客或帮助老人。这些应用程序的科学挑战是应对我们的物理世界所呈现的各种环境和场景。鉴于这种令人生畏的多样性，如果机器人不能适应和自我改进，那么为机器人编程以预见所有可能的事件转折注定会失败。为了培养机器人的自主性，特定任务的专业知识应该伴随着通用的学习算法，以确保泛化和自主性。这些算法应该以有意义的方式探索物理世界并与之交互，以不断适应其不断变化的（子）目标。强化学习（RL）就是这样一种学习框架，让智能体通过与环境的交互来适应。在强化学习中，智能体通过应用一个动作来探索它的环境，通过观察一个状态来收集信息，并通过收集奖励来激励它去适应。近年来，强化学习已经成为一种非常强大的决策工具，它在玩雅达利视频游戏时超越人类水平或在围棋棋盘游戏中击败大师的成功都证明了这一点。这些代理都是在为每个任务设计的特定模拟器上训练的，在每种情况下，给定的模拟器都代表了代理的整个现实。此外，当环境发生被认为对人类无关紧要的微小变化时，这些代理人就会被分解。开发一个能够快速适应环境变化的RL框架在很大程度上仍然是一个悬而未决的问题，也是该项目的核心焦点。为了提高RL对环境变化的适应性，我们建议研究其与直观物理模型（IPM）的集成。IPM，或者也被称为常识物理学，一直是人工智能和机器学习中的一个长期话题，但它们与RL的集成尚未完全实现。在我们提出的新框架中，我们的目标是自主发现和学习环境的显著特征，构成IPM，通过使用IPM的推理和规划以知情的方式进行探索。与传统的基于模型的强化学习的主要区别在于，IPM不会寻求平等地捕获环境的所有信息，而是学习更抽象和示意性的模型，这些模型将更好地适应环境的变化。例如，IPM将预测掉落的玻璃是否会破碎，但不会试图对玻璃破碎后玻璃碎片的确切位置进行建模。识别环境中的这些关键事件，从数据中学习模型，并将这些模型用于RL将是我们的主要研究方向。开发可以推广到所有这些任务变化的RL算法将是对机器学习和机器人社区的重大贡献。