权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Principled Deep Reinforcement Learning for Societal Systems

职业：社会系统的有原则的深度强化学习

基本信息

批准号：
2048075
负责人：
Zhaoran Wang
金额：
$ 50万
依托单位：
Northwestern University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-02-01 至 2026-01-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2048075&HistoricalAwards=false
关键词：
CAREER Principled Deep Reinforcement Learning

项目摘要

The recent breakthrough in deep reinforcement learning (RL), especially its superhuman-level performance in board and video games, e.g., Go, Atari, Dota, and StarCraft, opens up new avenues for controlling many complex and unknown systems via learning. However, for practical purposes beyond game playing, deep RL still suffers from a lack of efficiency and trustworthiness. In terms of efficiency, the empirical success of deep RL requires millions to billions of data points and days to weeks of running time. In terms of trustworthiness, the empirical success of deep RL is only measured by the received reward, which does not account for safety and robustness. Such a lack of efficiency and trustworthiness is further exacerbated when we scale up deep RL to design and optimize societal systems in critical domains, e.g., healthcare, transportation, power grid, financial network, and supply chain.This CAREER proposal addresses these challenges by establishing a theoretical framework for analyzing the computational efficiency and sample efficiency of single-agent deep RL and an algorithmic framework for achieving such efficiencies. Moreover, it leads to a stochastic game framework for achieving safety, robustness, scalability, fairness, risk-awareness, and incentivization in social systems via multi-agent deep RL. The research plan emphasizes connecting deep RL with multiple fields, e.g., nonconvex optimization, nonparametric statistics, causal inference, stochastic game, and social science. The education plan emphasizes teaching data-driven decision making as a fundamental skill for future generations, especially for future leaders, in societal contexts. In particular, it aims to promote the idea of data-driven social leadership and support underrepresented minority researchers and students, who personally experience pressing challenges in societal systems, from K-12 education to graduate training. In order to cope with the ongoing pandemic, the outreach plan involves organizing online seminars on data science and artificial intelligence, mentoring remote interns by integrating research and education, and engaging remote students via DataFest and Client Project Challenge.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

深度强化学习最近的突破，特别是它在围棋、雅达里、DOTA和星际争霸等棋类和视频游戏中的超人水平表现，为通过学习控制许多复杂和未知的系统开辟了新的途径。然而，出于游戏之外的实际目的，深度RL仍然缺乏效率和可信性。在效率方面，深度RL的经验成功需要数百万到数十亿个数据点和数天到数周的运行时间。在可信度方面，深度RL的经验成功仅以收到的报酬来衡量，这并没有考虑安全性和稳健性。当我们在医疗、交通、电网、金融网络和供应链等关键领域设计和优化社会系统时，这种效率和可信度的缺乏进一步加剧。本职业提案通过建立一个分析单代理深度RL的计算效率和样本效率的理论框架和实现这些效率的算法框架来解决这些挑战。此外，它还导致了一个随机博弈框架，通过多智能体深度RL在社会系统中实现安全性、稳健性、可伸缩性、公平性、风险意识和激励。研究计划强调将深度RL与多个领域联系起来，如非凸优化、非参数统计、因果推理、随机对策和社会科学。教育计划强调将数据驱动的决策作为未来几代人的基本技能，特别是未来领导人在社会背景下的基本技能。特别是，它的目的是促进以数据为导向的社会领导的理念，并支持代表性不足的少数群体研究人员和学生，他们个人在社会系统中经历着从K-12教育到研究生培训的紧迫挑战。为了应对持续的流行病，外展计划包括组织关于数据科学和人工智能的在线研讨会，通过整合研究和教育来指导远程实习生，以及通过Datafest和客户端项目挑战吸引远程学生。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zhaoran Wang其他文献

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

自我探索语言模型：在线对齐的主动偏好诱导

DOI：
10.48550/arxiv.2405.19332
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Shenao Zhang;Donghan Yu;Hiteshi Sharma;Ziyi Yang;Shuohang Wang;Hany Hassan;Zhaoran Wang
通讯作者：
Zhaoran Wang