权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exploiting Structure in Reinforcement Learning Problems

利用强化学习问题中的结构

基本信息

批准号：
9711753
负责人：
Satinder Baveja
金额：
$ 22.97万
依托单位：
University of Colorado at Boulder
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
1997
资助国家：
美国
起止时间：
1997-12-01 至 1998-11-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9711753&HistoricalAwards=false
关键词：
Exploiting Structure Reinforcement Learning Problems

项目摘要

Algorithms for learning by interaction, or reinforcement learning, typically ignore all structure in the environment and consequently tend to scale poorly. The goal of this research is to develop novel, efficient, and theoretically well-founded algorithms and architectures for learning by interaction in structured environments. Three kinds of environmental structure are considered: factorial structure in states and actions, additive structure in payoff functions, and hierarchical structure in states and actions. Such structure is common because many environments are composed from multiple, weakly interacting, components that are often organized hierarchically. The approach consists of exploiting this structure by learning separately for the different components and then compensating in a structure dependent manner for the approximation so introduced. The results of this research will elucidate many different interesting and useful structures common in learning by interaction problems and provide new reinforcement learning algorithms that make it possible to solve significantly larger structured problems than possible with the traditional approach. Possible applications include large-scale, dynamic, resource allocation problems intelecommunications, networking, and scheduling, as well as multi-agent problems from distributed control and artificial intelligence.

用于通过交互进行学习或强化学习的算法通常会忽略模型中的所有结构。环境，因此往往规模不大。本研究的目标是开发新颖，高效，和理论上有充分依据的算法和架构，用于在结构化的环境.考虑了三种环境结构：状态因子结构和行动，支付函数的加法结构，以及状态和行动的层次结构。等结构是常见的，因为许多环境是由多个，弱相互作用，通常按层次结构组织的组件。该方法包括利用这种结构，分别学习不同的组件，然后以结构相关的方式进行补偿对于这样引入的近似。这项研究的结果将阐明许多不同的有趣的和有用的结构共同学习的互动问题，并提供新的强化学习算法，使其有可能解决比可能的更大的结构化问题用传统的方法。可能的应用包括大规模、动态、资源分配智能化，网络和调度问题，以及来自分布式控制和人工智能。