权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CIF: Medium: Learning to Control from Data: from Theory to Practice

合作研究：CIF：媒介：从数据中学习控制：从理论到实践

基本信息

批准号：
2211209
负责人：
Jiantao Jiao
金额：
$ 80万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2211209&HistoricalAwards=false
关键词：
Collaborative Research CIF Medium Learning

项目摘要

Data-driven decision-making is playing an increasingly critical role in today's world with examples ranging from epidemic response to ridesharing optimization. However, learning an optimal control policy from data faces challenges in both the offline and online settings: (a) (Offline) It is unclear how to most efficiently utilize the available dataset which was collected a priori, especially when it does not cover all possible scenarios of interest. (b) (Online) It is unclear how to collect a dataset through minimal interactions with the environment in situations where it may be costly and unsafe to do so. Driven by the need to address these two challenges, this project aims to improve the sample efficiency of reinforcement learning (RL) in both settings. In addition, the project plans to incorporate adaptivity and trustworthiness that are required in practice. Activities complementary to these research thrusts include the training of future leaders of academia, industry, and government by equipping them with fundamental skills in data-driven decision making.The goal of this project is to develop the theory and algorithms for a new generation of data-driven decision rules in order to address critical challenges in modern RL. Specifically, the research agenda aims (i) to design sample-efficient and computationally-efficient algorithms for online and offline RL with function approximation, and (ii) to enhance the adaptivity and trustworthiness of existing RL paradigms. To achieve the first goal, it is proposed to incorporate optimistic exploration for online RL and pessimistic exploitation for offline RL into existing approaches with the help of faithful uncertainty quantification for neural networks. To achieve the second goal, it is proposed to incorporate model selection into existing approaches with the help of tight sample complexity characterizations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

数据驱动的决策在当今世界发挥着越来越重要的作用，从疫情应对到拼车优化都是例子。然而，从数据中学习最优控制策略在离线和在线设置中都面临挑战：(a)（离线）如何最有效地利用先验收集的可用数据集尚不清楚，特别是当它没有涵盖所有可能的感兴趣的场景时。(b)（在线）在成本高且不安全的情况下，如何通过与环境的最小交互来收集数据集尚不清楚。由于需要解决这两个挑战，该项目旨在提高两种情况下强化学习（RL）的样本效率。此外，该项目计划将实践中需要的适应性和可信度纳入其中。与这些研究重点相辅相成的活动包括培训学术界、工业界和政府的未来领导人，使他们具备数据驱动决策的基本技能。该项目的目标是为新一代数据驱动决策规则开发理论和算法，以解决现代强化学习中的关键挑战。具体而言，研究议程旨在(i)基于函数逼近为在线和离线强化学习设计样本效率和计算效率高的算法，以及（ii）增强现有强化学习范式的适应性和可信度。为了实现第一个目标，我们提出在神经网络的忠实不确定性量化的帮助下，将在线强化学习的乐观探索和离线强化学习的悲观开发纳入现有方法中。为了实现第二个目标，提出在紧密样本复杂度表征的帮助下，将模型选择纳入现有方法。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。