权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Reinforcement Learning with Predictive State Representations

RI：小：具有预测状态表示的强化学习

基本信息

批准号：
1319365
负责人：
Satinder Baveja
金额：
$ 45万
依托单位：
Regents of the University of Michigan - Ann Arbor
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2013
资助国家：
美国
起止时间：
2013-08-01 至 2018-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1319365&HistoricalAwards=false
关键词：
RI Small Reinforcement Learning Predictive

项目摘要

Like animals and humans, artificial autonomous agents that are able to predict short-term and long-term consequences of their actions can then plan their behavior, act more intelligently, and achieve greater reward. Agents that can learn such predictive models from experience can be more robust in their intelligence than agents that rely on pre-built models. The PI and graduate students are focused on the particularly challenging but natural case where observations from the agent's sensors far in the past can continue to influence the predictions of consequences of actions long into the future. (For example, the observation of where you park the car in the morning will help predict where you will see the car later in the day.) There are two broad classes of approaches to learning predictive models in such 'partially observable' settings. Finite-history models use short-term history of observations to predict future observations conditioned on actions; these are fast to learn but are limited because they cannot capture the effects of long-term history. Latent-variable models can capture the effects of long-term history by positing hidden or latent variables that capture the true state of the environment (e.g., the location of the car), but such models are difficult to learn because the latent variables have to be inferred from data. This project builds on previous work by the PI and others on a third approach, called Predictive State Representations (or PSRs), in which the agent maintains predictions of future observations conditioned on future actions as a summary-representation of history; these models can both be fast to learn and capture the effect of long-term history. This project develops new PSR-based methods and algorithms for hierarchical models, rich-feature-based models, and local and modular models. The project applies the new methods to challenging applications from active perception and robotics. In addition, theoretical understanding of these richer and newer methods will be developed. Altogether the project significantly expands the applicability of PSR-methods as well as their theoretical foundations and algorithms. Broader Impacts: New methods that allow artificial agents to robustly build predictive models would advance the state of knowledge across the fields of artificial intelligence, reinforcement learning, control, operations research, psychology, and neuroscience. The PI is co-leading an effort to create a new undergraduate degree in Data Sciences at the University of Michigan to be jointly managed by Computer Science & Engineering and Statistics. This future degree as well as other current undergraduate research programs will be targeted to recruit, mentor, and train students for this project.

像动物和人类一样，能够预测其行为的短期和长期后果的人工自主代理可以计划他们的行为，更聪明地行动，并获得更大的回报。能够从经验中学习这种预测模型的代理可以比依赖于预先构建的模型的代理在智能上更强大。PI和研究生们专注于特别具有挑战性但很自然的情况，即来自智能体传感器的观察结果在过去很长一段时间内可以继续影响对未来行动后果的预测。(For例如，观察你早上把车停在哪里，将有助于预测你当天晚些时候会在哪里看到车。有两大类方法来学习预测模型在这种“部分可观察”的设置。历史模型使用短期的观察历史来预测以行动为条件的未来观察;这些模型学习速度很快，但由于无法捕捉长期历史的影响而受到限制。潜变量模型可以通过设定捕捉环境真实状态的隐藏或潜变量来捕捉长期历史的影响（例如，汽车的位置），但是这样的模型很难学习，因为必须从数据中推断出潜在变量。该项目建立在PI和其他人之前关于第三种方法的工作基础上，称为预测状态表示（或PSR），其中代理维护对未来观察的预测，以未来行动为条件，作为历史的摘要表示;这些模型可以快速学习并捕获长期历史的影响。该项目为分层模型、基于丰富特征的模型以及局部和模块化模型开发了新的基于PSR的方法和算法。该项目将新方法应用于主动感知和机器人技术等具有挑战性的应用。此外，这些更丰富和更新的方法的理论理解将得到发展。总之，该项目显着扩展了PSR方法的适用性以及它们的理论基础和算法。更广泛的影响：允许人工智能强大地构建预测模型的新方法将推动人工智能、强化学习、控制、运筹学、心理学和神经科学领域的知识发展。PI正在共同领导一项努力，在密歇根大学创建一个新的数据科学本科学位，由计算机科学工程和统计学联合管理。这个未来的学位以及其他目前的本科研究项目将有针对性地招募，指导和培训学生这个项目。