CAREER: Soft-robust Methods for Offline Reinforcement Learning
职业:离线强化学习的软鲁棒方法
基本信息
- 批准号:2144601
- 负责人:
- 金额:$ 57.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-01 至 2027-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Improvements in sensors, data collection, and computational power have driven the desire to harness data to improve decision-making in various domains, such as precision agriculture or medicine. Making effective data-driven decisions is a challenging problem studied by the reinforcement learning community. Despite their success in simulated domains, like board games, existing reinforcement learning algorithms are still too unreliable for widespread deployment. This project develops new reinforcement learning algorithms that achieve reliability by carefully balancing the expected quality of recommended decisions with their risk of failure. The risk of failure is measured using techniques from stochastic finance, which enable flexible and efficient algorithms. These new, reliable algorithms will help bring data-driven decision-making to new domains, including agriculture, ecological management, and medicine. The research is integrated with educational activities to provide graduate and undergraduate students with training opportunities and new study materials, including a textbook. This research develops and analyzes algorithms for reinforcement learning (RL) problems with 1) limited or expensive data and 2) a high cost of failure. Soft-robust objectives build on convex risk measures to balance robustness and average quality of data-driven decision-making problems. While soft-robust objectives are well understood in single-stage optimization problems, many fundamental and practical questions remain open in multi-stage problems, like RL. This project will answer these questions and develop reliable, tractable, and scalable algorithms by tackling three main objectives. First, the project will establish soft-robust RL formulations' statistical and computational properties. Second, the project will generate a new class of tabular soft-robust RL algorithms built on new insights into the relationship between soft-robustness and robust Markov decision processes. Third, the project will scale the tabular algorithms to value function approximation and gradient style methods. The project will address both batch RL, with known rewards and estimated transition probabilities, and inverse RL, with estimated rewards and known transition probabilities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
传感器、数据收集和计算能力的改进推动了利用数据来改善各个领域的决策的愿望,例如精准农业或医学。做出有效的数据驱动决策是强化学习领域研究的一个具有挑战性的问题。尽管它们在棋类游戏等模拟领域取得了成功,但现有的强化学习算法仍然太不可靠,无法广泛应用。该项目开发了新的强化学习算法,通过仔细平衡推荐决策的预期质量和失败风险来实现可靠性。失败的风险是使用随机金融的技术来衡量的,这使得算法变得灵活而高效。这些新的、可靠的算法将有助于将数据驱动的决策引入新的领域,包括农业、生态管理和医学。这项研究与教育活动相结合,为研究生和本科生提供培训机会和新的学习材料,包括一本教科书。这项研究开发和分析了强化学习(RL)问题的算法,包括1)有限或昂贵的数据和2)高失败代价。软稳健目标建立在凸性风险度量之上,以平衡数据驱动的决策问题的稳健性和平均质量。虽然软-稳健目标在单阶段优化问题中得到了很好的理解,但许多基本的和实际的问题在多阶段问题中仍然是悬而未决的,比如RL。这个项目将回答这些问题,并通过解决三个主要目标来开发可靠、易处理和可扩展的算法。首先,该项目将建立软健壮的RL公式的统计和计算特性。其次,该项目将基于对软健壮性和健壮性马尔可夫决策过程之间关系的新见解,生成一类新的表格软健壮性RL算法。第三,该项目将把表格算法扩展到值函数近似和梯度式方法。该项目将涉及批次RL和反向RL,前者具有已知的奖励和估计的转换概率,后者具有估计的奖励和已知的转换概率。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Solving multi-model MDPs by coordinate ascent and dynamic programming
通过坐标上升和动态规划求解多模型 MDP
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Xihong Su, Marek Petrik
- 通讯作者:Xihong Su, Marek Petrik
Entropic Risk Optimization in Discounted MDPs
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:J. Hau;Marek Petrik;M. Ghavamzadeh
- 通讯作者:J. Hau;Marek Petrik;M. Ghavamzadeh
Policy Gradient in Robust MDPs with Global Convergence Guarantee
- DOI:
- 发表时间:2022-12
- 期刊:
- 影响因子:0
- 作者:Qiuhao Wang;C. Ho;Marek Petrik
- 通讯作者:Qiuhao Wang;C. Ho;Marek Petrik
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Marek Petrik其他文献
Learning Heuristic Functions through Approximate Linear Programming
通过近似线性规划学习启发式函数
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
Marek Petrik;S. Zilberstein - 通讯作者:
S. Zilberstein
Agile logistics simulation and optimization for managing disaster responses
用于管理灾难响应的敏捷物流模拟和优化
- DOI:
10.1109/wsc.2013.6721698 - 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
F. Barahona;M. Ettl;Marek Petrik;Peter M. Rimshnick - 通讯作者:
Peter M. Rimshnick
Interaction Structure and Dimensionality Reduction in Decentralized MDPs
去中心化 MDP 中的交互结构和降维
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
M. Allen;Marek Petrik;S. Zilberstein - 通讯作者:
S. Zilberstein
Learning parallel portfolios of algorithms
学习并行算法组合
- DOI:
- 发表时间:
2006 - 期刊:
- 影响因子:1.2
- 作者:
Marek Petrik;S. Zilberstein - 通讯作者:
S. Zilberstein
Beliefs We Can Believe in: Replacing Assumptions with Data in Real-Time Search
我们可以相信的信念:在实时搜索中用数据代替假设
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Maximilian Fickert;Tianyi Gu;Leonhard Staut;Wheeler Ruml;J. Hoffmann;Marek Petrik - 通讯作者:
Marek Petrik
Marek Petrik的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Marek Petrik', 18)}}的其他基金
RI: SMALL: Robust Reinforcement Learning Using Bayesian Models
RI:小:使用贝叶斯模型的鲁棒强化学习
- 批准号:
1815275 - 财政年份:2018
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
III: Small: Robust Reinforcement Learning for Invasive Species Management
III:小型:用于入侵物种管理的强大强化学习
- 批准号:
1717368 - 财政年份:2017
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
相似海外基金
I-Corps: Translation potential of stereolithography 3D printing to create soft elastomers
I-Corps:立体光刻 3D 打印制造软弹性体的转化潜力
- 批准号:
2414710 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
Collaborative Research: RUI: IRES Track I: From fundamental to applied soft matter: research experiences in Mexico
合作研究:RUI:IRES 第一轨:从基础到应用软物质:墨西哥的研究经验
- 批准号:
2426728 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
Maneuvering Bioinspired Soft Microrobots in Anisotropic Complex Fluids
在各向异性复杂流体中操纵仿生软微型机器人
- 批准号:
2323917 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
Reverse Design of Tuneable 4D Printed Materials for Soft Robotics
用于软体机器人的可调谐 4D 打印材料的逆向设计
- 批准号:
DE240100960 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Discovery Early Career Researcher Award
Organic Bionics: Soft Materials to Solve Hard Problems in Neuroengineering
有机仿生学:解决神经工程难题的软材料
- 批准号:
FT230100154 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
ARC Future Fellowships
Using soft X-ray coherent diffraction imaging to study and tailor the formation of superfluid helium droplets and quantum vortices within them
使用软 X 射线相干衍射成像来研究和定制超流氦液滴及其内部量子涡旋的形成
- 批准号:
23K28359 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Instrument Development: A lab-scale soft X-ray microscope for biological systems
仪器开发:用于生物系统的实验室规模软 X 射线显微镜
- 批准号:
EP/Z53108X/1 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Research Grant
Princeton-Oxford-Cambridge Centre-to-Centre Collaboration on Soft Functional Energy Materials
普林斯顿-牛津-剑桥软功能能源材料中心间合作
- 批准号:
EP/Z531303/1 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Research Grant
CAREER: Informed Testing — From Full-Field Characterization of Mechanically Graded Soft Materials to Student Equity in the Classroom
职业:知情测试 – 从机械分级软材料的全场表征到课堂上的学生公平
- 批准号:
2338371 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Standard Grant
Revolutionary Soft Surfboards - Advanced UK low carbon manufacturing for enhanced durability and 100% recyclability
革命性%20Soft%20冲浪板%20-%20Advanced%20UK%20low%20carbon%20制造%20for%20增强%20耐用性%20和%20100%%20可回收性
- 批准号:
10095272 - 财政年份:2024
- 资助金额:
$ 57.59万 - 项目类别:
Collaborative R&D