权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CIF: Medium: Statistical and Algorithmic Foundations of Distributionally Robust Policy Learning

合作研究：CIF：媒介：分布式稳健政策学习的统计和算法基础

基本信息

批准号：
2312204
负责人：
Jose Blanchet
金额：
$ 80万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2027-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2312204&HistoricalAwards=false
关键词：
Collaborative Research CIF Medium Statistical

项目摘要

Efficient data-driven policy learning and deployment techniques are transforming many facets of our society as a result of their broad applicability in engineering, scientific and societal applications. Given the access to high-performance computing, the use of simulators and digital twins, for example, have emerged as practical alternatives to test and learn complex optimization policies. As a result, significant scholarly efforts have been devoted to this research area in the past decade. However, despite having made landmark progress, existing work in this area often makes a key (implicit) assumption; namely, that the environment in which the policy is trained will be the same as the environment in which the policy is deployed. Policies learned under this assumption can be fragile, as this assumption often does not hold in practical environments, either due to the simulator model specification or environment shifts. The goal of this project is to study statistical and algorithmic foundations for developing provably efficient robust policy learning in unknown environments, under a possibly misspecified generative model. The project studies comprehensive statistical and algorithmic foundations for distributionally robust policy learning in contextual bandits and reinforcement learning (RL) environments and develops statistically optimal and computationally efficient algorithms across a wide range of non-parametric distributional shifts. These provide a powerful framework for capturing model-agnostic environment changes, but at the same time, pose intellectual challenges as the unknown worst-case environment lies in an infinite-dimensional space. The presented program opens up several fundamental research directions that call for novel and principled developments. First, the project develops information-theoretic tools to understand the fundamental learning limits for distributionally robust policy learning and to characterize how the distributional uncertainty contributes to the difficulty of learning. Additionally, the project develops computationally efficient and statistically optimal estimation schemes for distributionally robust performance analysis of a given policy. Lastly, the project translates the efficiency gains in estimation due to learning a distributionally robust policy.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

高效的数据驱动策略学习和部署技术正在改变我们社会的许多方面，因为它们在工程、科学和社会应用中具有广泛的适用性。考虑到高性能计算的可用性，模拟器和数字孪生的使用已经成为测试和学习复杂优化策略的实用替代方案。因此，在过去的十年中，这一研究领域已经投入了大量的学术努力。然而，尽管取得了里程碑式的进展，这一领域的现有工作往往作出一个关键的（隐含的）假设，即培训政策的环境将与部署政策的环境相同。在此假设下学习的策略可能是脆弱的，因为由于模拟器模型规范或环境变化，此假设通常在实际环境中不成立。该项目的目标是研究统计和算法基础，在未知环境中，在可能错误指定的生成模型下，开发可证明有效的鲁棒策略学习。该项目研究了上下文强盗和强化学习（RL）环境中分布鲁棒政策学习的全面统计和算法基础，并在广泛的非参数分布变化中开发统计最优和计算效率高的算法。这些提供了一个强大的框架来捕获模型无关的环境变化，但同时，由于未知的最坏情况环境位于无限维空间中，因此也带来了智力上的挑战。该计划开辟了几个基本的研究方向，要求新颖和有原则的发展。首先，该项目开发的信息理论工具，以了解基本的学习极限分布鲁棒的政策学习和分布的不确定性如何有助于学习的困难。此外，该项目开发了计算效率和统计最优的估计方案，用于给定政策的分布鲁棒性能分析。最后，该项目通过学习分布稳健的政策，将效率收益转化为评估。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jose Blanchet其他文献

Optimal Sample Complexity of Reinforcement Learning for Uniformly Ergodic Discounted Markov Decision Processes

均匀遍历贴现马尔可夫决策过程的强化学习的最优样本复杂度

DOI：
发表时间：
2023
期刊：
arXiv.org
影响因子：
0
作者：
Shengbo Wang;Jose Blanchet;Peter Glynn
通讯作者：
Peter Glynn

A Model of Bed Demand to Facilitate the Implementation of Data-driven Recommendations for COVID-19 Capacity Management

床位需求模型促进实施数据驱动的 COVID-19 容量管理建议

DOI：
10.21203/rs.3.rs-31953/v1
发表时间：
2020
期刊：
影响因子：
0
作者：
Teng Zhang;Kelly A McFarlane;J. Vallon;Linying Yang;Jin Xie;Jose Blanchet;P. Glynn;Kristan Staudenmayer;K. Schulman;D. Scheinker
通讯作者：
D. Scheinker