权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Soft-robust Methods for Offline Reinforcement Learning

职业：离线强化学习的软鲁棒方法

基本信息

批准号：
2144601
负责人：
Marek Petrik
金额：
$ 57.59万
依托单位：
University of New Hampshire
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-09-01 至 2027-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2144601&HistoricalAwards=false
关键词：
CAREER Soft robust Methods Offline

项目摘要

Improvements in sensors, data collection, and computational power have driven the desire to harness data to improve decision-making in various domains, such as precision agriculture or medicine. Making effective data-driven decisions is a challenging problem studied by the reinforcement learning community. Despite their success in simulated domains, like board games, existing reinforcement learning algorithms are still too unreliable for widespread deployment. This project develops new reinforcement learning algorithms that achieve reliability by carefully balancing the expected quality of recommended decisions with their risk of failure. The risk of failure is measured using techniques from stochastic finance, which enable flexible and efficient algorithms. These new, reliable algorithms will help bring data-driven decision-making to new domains, including agriculture, ecological management, and medicine. The research is integrated with educational activities to provide graduate and undergraduate students with training opportunities and new study materials, including a textbook. This research develops and analyzes algorithms for reinforcement learning (RL) problems with 1) limited or expensive data and 2) a high cost of failure. Soft-robust objectives build on convex risk measures to balance robustness and average quality of data-driven decision-making problems. While soft-robust objectives are well understood in single-stage optimization problems, many fundamental and practical questions remain open in multi-stage problems, like RL. This project will answer these questions and develop reliable, tractable, and scalable algorithms by tackling three main objectives. First, the project will establish soft-robust RL formulations' statistical and computational properties. Second, the project will generate a new class of tabular soft-robust RL algorithms built on new insights into the relationship between soft-robustness and robust Markov decision processes. Third, the project will scale the tabular algorithms to value function approximation and gradient style methods. The project will address both batch RL, with known rewards and estimated transition probabilities, and inverse RL, with estimated rewards and known transition probabilities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

传感器、数据收集和计算能力的改进推动了利用数据来改善各个领域的决策的愿望，例如精准农业或医学。做出有效的数据驱动决策是强化学习领域研究的一个具有挑战性的问题。尽管它们在棋类游戏等模拟领域取得了成功，但现有的强化学习算法仍然太不可靠，无法广泛应用。该项目开发了新的强化学习算法，通过仔细平衡推荐决策的预期质量和失败风险来实现可靠性。失败的风险是使用随机金融的技术来衡量的，这使得算法变得灵活而高效。这些新的、可靠的算法将有助于将数据驱动的决策引入新的领域，包括农业、生态管理和医学。这项研究与教育活动相结合，为研究生和本科生提供培训机会和新的学习材料，包括一本教科书。这项研究开发和分析了强化学习(RL)问题的算法，包括1)有限或昂贵的数据和2)高失败代价。软稳健目标建立在凸性风险度量之上，以平衡数据驱动的决策问题的稳健性和平均质量。虽然软-稳健目标在单阶段优化问题中得到了很好的理解，但许多基本的和实际的问题在多阶段问题中仍然是悬而未决的，比如RL。这个项目将回答这些问题，并通过解决三个主要目标来开发可靠、易处理和可扩展的算法。首先，该项目将建立软健壮的RL公式的统计和计算特性。其次，该项目将基于对软健壮性和健壮性马尔可夫决策过程之间关系的新见解，生成一类新的表格软健壮性RL算法。第三，该项目将把表格算法扩展到值函数近似和梯度式方法。该项目将涉及批次RL和反向RL，前者具有已知的奖励和估计的转换概率，后者具有估计的奖励和已知的转换概率。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Solving multi-model MDPs by coordinate ascent and dynamic programming

通过坐标上升和动态规划求解多模型 MDP

DOI：
发表时间：
2023
期刊：
Conference on Uncertainty in Artificial Intelligence
影响因子：
0
作者：
Xihong Su, Marek Petrik
通讯作者：
Xihong Su, Marek Petrik

Entropic Risk Optimization in Discounted MDPs

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
J. Hau;Marek Petrik;M. Ghavamzadeh
通讯作者：
J. Hau;Marek Petrik;M. Ghavamzadeh

Policy Gradient in Robust MDPs with Global Convergence Guarantee

DOI：
发表时间：
2022-12
期刊：
影响因子：
0
作者：
Qiuhao Wang;C. Ho;Marek Petrik
通讯作者：
Qiuhao Wang;C. Ho;Marek Petrik

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Marek Petrik其他文献

Learning Heuristic Functions through Approximate Linear Programming

通过近似线性规划学习启发式函数

DOI：
发表时间：
2008
期刊：
影响因子：
0
作者：
Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Agile logistics simulation and optimization for managing disaster responses

用于管理灾难响应的敏捷物流模拟和优化

DOI：
10.1109/wsc.2013.6721698
发表时间：
2013
期刊：
2013 Winter Simulations Conference (WSC)
影响因子：
0
作者：
F. Barahona;M. Ettl;Marek Petrik;Peter M. Rimshnick
通讯作者：
Peter M. Rimshnick

Interaction Structure and Dimensionality Reduction in Decentralized MDPs

去中心化 MDP 中的交互结构和降维

DOI：
发表时间：
2008
期刊：
AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
M. Allen;Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Learning parallel portfolios of algorithms

学习并行算法组合

DOI：
发表时间：
2006
期刊：
Annals of Mathematics and Artificial Intelligence
影响因子：
1.2
作者：
Marek Petrik;S. Zilberstein
通讯作者：
S. Zilberstein

Beliefs We Can Believe in: Replacing Assumptions with Data in Real-Time Search

我们可以相信的信念：在实时搜索中用数据代替假设

DOI：
发表时间：
2020
期刊：
AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
Maximilian Fickert;Tianyi Gu;Leonhard Staut;Wheeler Ruml;J. Hoffmann;Marek Petrik
通讯作者：
Marek Petrik