权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Dynamic Abstraction in Reinforcement Learning

强化学习中的动态抽象

基本信息

批准号：
0218125
负责人：
Andrew Barto
金额：
$ 19.96万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2002
资助国家：
美国
起止时间：
2002-09-01 至 2005-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0218125&HistoricalAwards=false
关键词：
Dynamic Abstraction Reinforcement Learning

项目摘要

This project investigates reinforcement learning algorithms that use dynamic abstraction to exploit the spatial and temporal structure of complex environments to facilitate learning. The use of abstraction is one of the features of human intelligence that allows us to operate as effectively as we do in complex environments. We systematically ignore details that are not relevant to a task at hand, and we rapidly switch between abstractions when we focus on a succession of subtasks. For example, in planning everyday activities, such as driving to work, we abstract out irrelevant details such as the layout of objects inside the car, but when we actually drive, many of these details become relevant, such as the locations of the steering wheel and the accelerator. Different abstractions are appropriate for different tasks or subtasks, and the agent has to shift abstractions as it shifts to new tasks or to new subtasks.This project combines the theory of options with factored state and action representations to give precise meaning to the concept of dynamic abstractions and to study methods for creating and exploiting them. It will develop formalisms for representing option models in terms of factored state and action representations by extending existing formalisms for single-step dynamic Bayes network models to the multi-time case. It will investigate how the multi-time formulation call facilitate creating and using dynamic abstractions. An algebraic theory of abstraction will be developed by extending relevant concepts from classical automata theory to multi-time factored models. Methods will be developed for learning compact multistep option models by extending an existing mixture model algorithm for learning transition models from single-step to multi-step models. In general the notion of dynamic abstraction will be a valuable tool to apply to many difficult optimization problems in large-scale manufacturing (e.g., factory process control), robotics (navigation), multi-agent coordination, and other state-of-the-art applications of reinforcement learning. Since this research combines ideas from the fields of decision theory, operations research, control theory, cognitive science, and AI, it may provide a useful bridge that has the potential to foster contributions in all of these fields.

该项目研究强化学习算法，这些算法使用动态抽象来利用复杂环境的空间和时间结构来促进学习。使用抽象是人类智能的特征之一，它使我们能够像在复杂环境中一样有效地操作。我们系统地忽略与手头任务无关的细节，当我们专注于一系列子任务时，我们会在抽象之间快速切换。例如，在计划日常活动时，例如开车上班，我们抽象出不相关的细节，例如车内物体的布局，但当我们实际驾驶时，许多细节变得相关，例如方向盘和加速器的位置。不同的抽象适用于不同的任务或子任务，当代理转移到新的任务或新的子任务时，代理必须转移抽象。本项目将选项理论与因子化状态和动作表示相结合，以精确地理解动态抽象的概念，并研究创建和利用它们的方法。它将发展形式主义表示期权模型的因素化的状态和动作表示扩展现有的形式主义单步动态贝叶斯网络模型的多时间的情况下。它将研究多时间公式化调用如何促进创建和使用动态抽象。一个抽象的代数理论将通过从经典自动机理论扩展到多时间因子模型的相关概念来发展。方法将开发用于学习紧凑的多步期权模型，通过扩展现有的混合模型算法学习过渡模型从单步到多步模型。一般来说，动态抽象的概念将是应用于大规模制造中的许多困难的优化问题的有价值的工具（例如，工厂过程控制）、机器人（导航）、多智能体协调以及强化学习的其他最先进应用。由于这项研究结合了决策理论，运筹学，控制理论，认知科学和人工智能领域的思想，它可能提供一个有用的桥梁，有可能促进所有这些领域的贡献。