权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CIF: Medium: Learning to Control from Data: from Theory to Practice

合作研究：CIF：媒介：从数据中学习控制：从理论到实践

基本信息

批准号：
2211210
负责人：
Zhaoran Wang
金额：
$ 39.89万
依托单位：
Northwestern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2211210&HistoricalAwards=false
关键词：
Collaborative Research CIF Medium Learning

项目摘要

Data-driven decision-making is playing an increasingly critical role in today's world with examples ranging from epidemic response to ridesharing optimization. However, learning an optimal control policy from data faces challenges in both the offline and online settings: (a) (Offline) It is unclear how to most efficiently utilize the available dataset which was collected a priori, especially when it does not cover all possible scenarios of interest. (b) (Online) It is unclear how to collect a dataset through minimal interactions with the environment in situations where it may be costly and unsafe to do so. Driven by the need to address these two challenges, this project aims to improve the sample efficiency of reinforcement learning (RL) in both settings. In addition, the project plans to incorporate adaptivity and trustworthiness that are required in practice. Activities complementary to these research thrusts include the training of future leaders of academia, industry, and government by equipping them with fundamental skills in data-driven decision making.The goal of this project is to develop the theory and algorithms for a new generation of data-driven decision rules in order to address critical challenges in modern RL. Specifically, the research agenda aims (i) to design sample-efficient and computationally-efficient algorithms for online and offline RL with function approximation, and (ii) to enhance the adaptivity and trustworthiness of existing RL paradigms. To achieve the first goal, we propose to incorporate optimistic exploration for online RL and pessimistic exploitation for offline RL into existing approaches with the help of faithful uncertainty quantification for neural networks. To achieve the second goal, we propose to incorporate model selection into existing approaches with the help of tight sample complexity characterizations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

数据驱动的决策在当今世界发挥着越来越重要的作用，从疫情应对到拼车优化都有例子。然而，从数据中学习最优控制策略在离线和在线设置中都面临挑战：（a）（离线）不清楚如何最有效地利用先验收集的可用数据集，特别是当它不覆盖所有可能的感兴趣场景时。(b)目前尚不清楚如何通过与环境的最小交互来收集数据集，因为这样做可能成本高昂且不安全。由于需要解决这两个挑战，该项目旨在提高这两种情况下强化学习（RL）的样本效率。此外，该项目计划纳入实践中所需的适应性和可信度。作为这些研究方向的补充，还将培养未来的学术界、工业界和政府领导者，使他们具备数据驱动决策的基本技能。本项目的目标是开发新一代数据驱动决策规则的理论和算法，以应对现代强化学习中的关键挑战。具体而言，研究议程的目的是（i）设计样本效率和计算效率的算法，用于在线和离线RL函数逼近，以及（ii）提高现有RL范式的适应性和可信度。为了实现第一个目标，我们建议将在线RL的乐观探索和离线RL的悲观开发结合到现有的方法中，并对神经网络进行忠实的不确定性量化。为了实现第二个目标，我们建议将模型选择到现有的方法与紧密的样本复杂性characterizations.This奖项的帮助下，反映了NSF的法定使命，并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zhaoran Wang其他文献

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

自我探索语言模型：在线对齐的主动偏好诱导

DOI：
10.48550/arxiv.2405.19332
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Shenao Zhang;Donghan Yu;Hiteshi Sharma;Ziyi Yang;Shuohang Wang;Hany Hassan;Zhaoran Wang
通讯作者：
Zhaoran Wang