CAREER: Principled Deep Reinforcement Learning for Societal Systems
职业:社会系统的有原则的深度强化学习
基本信息
- 批准号:2048075
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-02-01 至 2026-01-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The recent breakthrough in deep reinforcement learning (RL), especially its superhuman-level performance in board and video games, e.g., Go, Atari, Dota, and StarCraft, opens up new avenues for controlling many complex and unknown systems via learning. However, for practical purposes beyond game playing, deep RL still suffers from a lack of efficiency and trustworthiness. In terms of efficiency, the empirical success of deep RL requires millions to billions of data points and days to weeks of running time. In terms of trustworthiness, the empirical success of deep RL is only measured by the received reward, which does not account for safety and robustness. Such a lack of efficiency and trustworthiness is further exacerbated when we scale up deep RL to design and optimize societal systems in critical domains, e.g., healthcare, transportation, power grid, financial network, and supply chain.This CAREER proposal addresses these challenges by establishing a theoretical framework for analyzing the computational efficiency and sample efficiency of single-agent deep RL and an algorithmic framework for achieving such efficiencies. Moreover, it leads to a stochastic game framework for achieving safety, robustness, scalability, fairness, risk-awareness, and incentivization in social systems via multi-agent deep RL. The research plan emphasizes connecting deep RL with multiple fields, e.g., nonconvex optimization, nonparametric statistics, causal inference, stochastic game, and social science. The education plan emphasizes teaching data-driven decision making as a fundamental skill for future generations, especially for future leaders, in societal contexts. In particular, it aims to promote the idea of data-driven social leadership and support underrepresented minority researchers and students, who personally experience pressing challenges in societal systems, from K-12 education to graduate training. In order to cope with the ongoing pandemic, the outreach plan involves organizing online seminars on data science and artificial intelligence, mentoring remote interns by integrating research and education, and engaging remote students via DataFest and Client Project Challenge.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度强化学习最近的突破,特别是它在围棋、雅达里、DOTA和星际争霸等棋类和视频游戏中的超人水平表现,为通过学习控制许多复杂和未知的系统开辟了新的途径。然而,出于游戏之外的实际目的,深度RL仍然缺乏效率和可信性。在效率方面,深度RL的经验成功需要数百万到数十亿个数据点和数天到数周的运行时间。在可信度方面,深度RL的经验成功仅以收到的报酬来衡量,这并没有考虑安全性和稳健性。当我们在医疗、交通、电网、金融网络和供应链等关键领域设计和优化社会系统时,这种效率和可信度的缺乏进一步加剧。本职业提案通过建立一个分析单代理深度RL的计算效率和样本效率的理论框架和实现这些效率的算法框架来解决这些挑战。此外,它还导致了一个随机博弈框架,通过多智能体深度RL在社会系统中实现安全性、稳健性、可伸缩性、公平性、风险意识和激励。研究计划强调将深度RL与多个领域联系起来,如非凸优化、非参数统计、因果推理、随机对策和社会科学。教育计划强调将数据驱动的决策作为未来几代人的基本技能,特别是未来领导人在社会背景下的基本技能。特别是,它的目的是促进以数据为导向的社会领导的理念,并支持代表性不足的少数群体研究人员和学生,他们个人在社会系统中经历着从K-12教育到研究生培训的紧迫挑战。为了应对持续的流行病,外展计划包括组织关于数据科学和人工智能的在线研讨会,通过整合研究和教育来指导远程实习生,以及通过Datafest和客户端项目挑战吸引远程学生。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhaoran Wang其他文献
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
自我探索语言模型:在线对齐的主动偏好诱导
- DOI:
10.48550/arxiv.2405.19332 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Shenao Zhang;Donghan Yu;Hiteshi Sharma;Ziyi Yang;Shuohang Wang;Hany Hassan;Zhaoran Wang - 通讯作者:
Zhaoran Wang
Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
具有接触动力学的一阶策略梯度的自适应障碍平滑
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Shenao Zhang;Wanxin Jin;Zhaoran Wang - 通讯作者:
Zhaoran Wang
Safe MPC Alignment with Human Directional Feedback
安全 MPC 对准与人工定向反馈
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Zhixian Xie;Wenlong Zhang;Yi Ren;Zhaoran Wang;George J. Pappas;Wanxin Jin - 通讯作者:
Wanxin Jin
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information
离线强化学习,用于人类引导的私人信息人机交互
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Zuyue Fu;Zhengling Qi;Zhuoran Yang;Zhaoran Wang;Lan Wang - 通讯作者:
Lan Wang
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes
混杂马尔可夫决策过程中使用工具变量的离线强化学习
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Zuyue Fu;Zhengling Qi;Zhaoran Wang;Zhuoran Yang;Yanxun Xu;Michael R. Kosorok - 通讯作者:
Michael R. Kosorok
Zhaoran Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhaoran Wang', 18)}}的其他基金
Collaborative Research: CIF: Medium: Learning to Control from Data: from Theory to Practice
合作研究:CIF:媒介:从数据中学习控制:从理论到实践
- 批准号:
2211210 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: A Unified Framework of Distributional Optimization via Variational Transport
合作研究:CIF:小型:通过变分传输的分布式优化的统一框架
- 批准号:
2008827 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: High-Dimensional Decision Making and Inference with Applications for Personalized Medicine
合作研究:高维决策和推理及其在个性化医疗中的应用
- 批准号:
2015568 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
相似海外基金
A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
- 批准号:
23K24851 - 财政年份:2024
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Principled yet practical observability for a microservices-based cloud
职业:基于微服务的云的原则性且实用的可观察性
- 批准号:
2340128 - 财政年份:2024
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
- 批准号:
2237640 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Principled phylogenomic analysis without gene tree estimation
无需基因树估计的有原则的系统发育分析
- 批准号:
2308495 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
A principled generalization of the maximum entropy principle for non-Shannon systems
非香农系统最大熵原理的原则概括
- 批准号:
23K16855 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
- 批准号:
22H03595 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Principled Approaches to Securing Next-Generation Cellular Networks
职业:保护下一代蜂窝网络的原则性方法
- 批准号:
2145631 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Collaborative: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
协作:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
- 批准号:
2219810 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
合作研究:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
- 批准号:
2220345 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
NeTS: Small: Hybrid Switching in Data Center Networks: Systems-driven Modeling and Principled Algorithms
NetS:小型:数据中心网络中的混合交换:系统驱动的建模和原理算法
- 批准号:
2309187 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant