权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: SMALL: Robust Reinforcement Learning Using Bayesian Models

RI：小：使用贝叶斯模型的鲁棒强化学习

基本信息

批准号：
1815275
负责人：
Marek Petrik
金额：
$ 43.78万
依托单位：
University of New Hampshire
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-15 至 2023-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1815275&HistoricalAwards=false
关键词：
RI SMALL Robust Reinforcement Learning

项目摘要

Basing decisions on data is preferable to relying on heuristics or rules of thumb. Using data effectively, however, can be challenging. In domains like agriculture or medicine, datasets are usually small, biased, and noisy. For instance, the full effects of reduced pesticide applications depend on the weather and the impacts on yield may not be known until the harvest. Reducing pesticide applications reduces costs and provides ecological and consumer benefits, but using too little of it can easily cause a crop failure and significant financial losses. These dual problems of limited data availability and a high cost of failure are also common in manufacturing, maintenance, and even robotics. Because most existing reinforcement learning methods assume large datasets, stakeholders often dismiss data-driven methods and rely on heuristics to make decisions that are apparently safe but quite sub-optimal. This research develops new robust methods for data-driven decision making that can recommend good actions that are also safe even when data is limited. The new reinforcement learning methods use prior domain knowledge to estimate the confidence in possible outcomes to prevent catastrophic failure when predictions are incorrect. The practical viability of these methods is tested on the problem of using historical data to recommending improved pesticide schedules for fruit orchards and is disseminated to practitioners.This research targets reinforcement learning problems with 1) limited or expensive data and 2) a high cost of failure. When bad decisions cause large losses, injury, or death, then having confidence in a policy's quality is more important than its optimality gap. Computing high-confidence policies in reinforcement learning is difficult. Even small errors can quickly accumulate through positive feedback loops and covariate shift. Therefore, more robust methods are needed to convince practitioners to benefit from data instead of relying on heuristics. The project combines robust optimization with model-based reinforcement learning to compute good policies that are resistant to data errors. Robust optimization has achieved successes in many areas but can be difficult to use with reinforcement learning. It requires a model of plausible uncertainty levels, so-called ambiguity sets, to properly balance solution?s quality and confidence. Constructing good ambiguity sets manually in sequential decision problems is very difficult even for robust optimization experts. This research investigates a new data-driven Bayesian approach to robust reinforcement learning. It combines hierarchical Bayesian models with robust optimization to leverage powerful hierarchical modeling techniques while avoiding the computational complexity often associated with Bayesian reinforcement learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

基于数据的决策比依靠经验或经验法则更可取。然而，有效地使用数据可能具有挑战性。在农业或医学等领域，数据集通常很小，有偏见，而且有噪音。例如，减少农药使用的全部效果取决于天气，对产量的影响可能要到收获时才能知道。减少农药的使用可以降低成本，并提供生态和消费者利益，但使用太少的农药很容易导致作物歉收和重大的经济损失。有限的数据可用性和高故障成本的双重问题在制造、维护甚至机器人技术中也很常见。由于大多数现有的强化学习方法都假设大型数据集，因此利益相关者通常会放弃数据驱动的方法，并依赖于算法来做出看似安全但相当次优的决策。这项研究为数据驱动的决策开发了新的强大方法，即使在数据有限的情况下，也可以推荐安全的良好行动。新的强化学习方法使用先验领域知识来估计可能结果的置信度，以防止预测不正确时发生灾难性故障。这些方法的实际可行性进行了测试的问题，使用历史数据，以建议改进的农药时间表果园，并传播给practitioners.This研究目标强化学习问题1）有限或昂贵的数据和2）失败的成本高。当错误的决策导致巨大的损失、伤害或死亡时，对政策质量的信心比其最优差距更重要。在强化学习中计算高置信度策略是困难的。即使是很小的误差也可以通过正反馈循环和协变量偏移迅速积累。因此，需要更强大的方法来说服从业者从数据中受益，而不是依赖于统计学。该项目将鲁棒优化与基于模型的强化学习相结合，以计算出能够抵抗数据错误的良好策略。鲁棒优化在许多领域都取得了成功，但很难与强化学习一起使用。它需要一个模型的合理的不确定性水平，所谓的模糊集，以适当平衡的解决方案？的质量和信心。在序贯决策问题中，人工构造良好的模糊集是非常困难的，即使是健壮的优化专家。本研究探讨了一种新的数据驱动贝叶斯方法，以强大的强化学习。它将分层贝叶斯模型与鲁棒优化相结合，以利用强大的分层建模技术，同时避免通常与贝叶斯强化学习相关的计算复杂性。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（15）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

DOI：
发表时间：
2019-02
期刊：
ArXiv
影响因子：
0
作者：
Marek Petrik;R. Russel
通讯作者：
Marek Petrik;R. Russel

Optimizing Percentile Criterion Using Robust MDPs

使用稳健的 MDP 优化百分位数标准

DOI：
发表时间：
2021
期刊：
Proceedings of Machine Learning Research
影响因子：
0
作者：
Bahram Behzadian, Reazul Hasan
通讯作者：
Bahram Behzadian, Reazul Hasan

Fast Algorithms for L-infinity constrained S-rectangular Robust MDPs

L-无穷大约束 S-矩形鲁棒 MDP 的快速算法

DOI：
发表时间：
2021
期刊：
Neural Information Processing Systems
影响因子：
0
作者：
Bahram Behzadian, Marek Petrik
通讯作者：
Bahram Behzadian, Marek Petrik

Bayesian Robust Optimization for Imitation Learning

DOI：
发表时间：
2020-07
期刊：
ArXiv
影响因子：
0
作者：
Daniel S. Brown;S. Niekum;Marek Petrik
通讯作者：
Daniel S. Brown;S. Niekum;Marek Petrik

Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations

从演示中进行交互动力学的逆强化学习

DOI：
发表时间：
2019
期刊：
International Conference on Robotics and Automation (ICRA
影响因子：
0
作者：
Mostafa Hussein, Momotaz Begum
通讯作者：
Mostafa Hussein, Momotaz Begum

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Marek Petrik其他文献

Learning Heuristic Functions through Approximate Linear Programming

通过近似线性规划学习启发式函数

DOI：
发表时间：
2008
期刊：
影响因子：
0
作者：
Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Agile logistics simulation and optimization for managing disaster responses

用于管理灾难响应的敏捷物流模拟和优化

DOI：
10.1109/wsc.2013.6721698
发表时间：
2013
期刊：
2013 Winter Simulations Conference (WSC)
影响因子：
0
作者：
F. Barahona;M. Ettl;Marek Petrik;Peter M. Rimshnick
通讯作者：
Peter M. Rimshnick

Interaction Structure and Dimensionality Reduction in Decentralized MDPs

去中心化 MDP 中的交互结构和降维

DOI：
发表时间：
2008
期刊：
AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
M. Allen;Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Learning parallel portfolios of algorithms

学习并行算法组合

DOI：
发表时间：
2006
期刊：
Annals of Mathematics and Artificial Intelligence
影响因子：
1.2
作者：
Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Beliefs We Can Believe in: Replacing Assumptions with Data in Real-Time Search

我们可以相信的信念：在实时搜索中用数据代替假设

DOI：
发表时间：
2020
期刊：
AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
Maximilian Fickert;Tianyi Gu;Leonhard Staut;Wheeler Ruml;J. Hoffmann;Marek Petrik
通讯作者：
Marek Petrik