CAREER: Stochasticity and Resilience in Reinforcement Learning: From Single to Multiple Agents

职业:强化学习中的随机性和弹性:从单个智能体到多个智能体

基本信息

  • 批准号:
    2339794
  • 负责人:
  • 金额:
    $ 53.29万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-03-01 至 2029-02-28
  • 项目状态:
    未结题

项目摘要

Reinforcement Learning (RL) has emerged as a promising data-driven paradigm for learning to control unknown and complex systems. It has achieved impressive success in simulated environments such as games. However, for applications in real-world engineering systems, existing RL algorithms and theory fall short of addressing three fundamental challenges: high stochasticity, long-horizon regimes and vulnerability to model uncertainty. These challenges are exacerbated in systems with multiple strategic agents. The goal of this CAREER project is to advance the algorithmic and theoretical foundations of RL by addressing these challenges, and enable efficient and resilient RL-based control in engineering systems. This project will particularly focus on applications in computer and communication networks, which will guide the problem formulation, methodology development and evaluation. The project is enhanced by an education plan that aims to offer students from K–12 to college a pathway to obtain experience and training in RL and broadly machine learning, as well as in their applications in engineering systems. This project will also support a mentoring program for students fromunderrepresented groups in STEM.The research work in this project will address the aforementioned challenges via three technical thrusts. Thrust 1 studies finite-time convergence of various iterative algorithms that arise in RL through the unified variational inequality framework, by leveraging tools from modern Markov chain theory. In Thrust 2, we will develop techniques to tame the high stochasticity in long-horizon problems, and further develop RL algorithms that provably learn a stable and near-optimal policy. Thrust 3 studies scalable multi-agent RL through the framework of mean-field game and graphon game, as well as the game theoretical foundation of robust Markov games under model uncertainty. The developed RL algorithms will be implemented and evaluated in a broad profile of decision-making problems in computer and communication networks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习(RL)已经成为一种有前途的数据驱动的学习范式,用于控制未知和复杂的系统。它在游戏等模拟环境中取得了令人印象深刻的成功。然而,对于实际工程系统的应用,现有的强化学习算法和理论无法解决三个基本挑战:高随机性、长视界制度和易受模型不确定性影响。这些挑战在具有多个战略代理的系统中更加严重。CAREER项目的目标是通过解决这些挑战来推进强化学习的算法和理论基础,并在工程系统中实现高效和弹性的基于强化学习的控制。这个项目将特别着重于计算机和通信网络的应用,这将指导问题的拟订、方法的发展和评价。该项目通过一项教育计划得到加强,该计划旨在为从K-12到大学的学生提供获得强化学习和广泛机器学习经验和培训的途径,以及它们在工程系统中的应用。该项目还将支持一项针对STEM中代表性不足群体的学生的指导计划。该项目的研究工作将通过三个技术重点来解决上述挑战。推力1通过统一的变分不等式框架,利用现代马尔可夫链理论的工具,研究RL中出现的各种迭代算法的有限时间收敛性。在推力2中,我们将开发技术来驯服长视界问题中的高随机性,并进一步开发可证明学习稳定和接近最优策略的强化学习算法。Thrust 3通过平均场博弈和graphon博弈的框架研究可扩展多智能体强化学习,以及模型不确定性下鲁棒马尔可夫博弈的博弈理论基础。开发的强化学习算法将在计算机和通信网络的决策问题的广泛概况中实施和评估。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Qiaomin Xie其他文献

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis
使用蒙特卡罗树搜索和监督学习的强化学习:非渐近分析
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Devavrat Shah;Qiaomin Xie;Zhi Xu
  • 通讯作者:
    Zhi Xu
On Faking a Nash Equilibrium
关于伪造纳什均衡
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Young Wu;Jeremy McMahan;Xiaojin Zhu;Qiaomin Xie
  • 通讯作者:
    Qiaomin Xie
Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA
常步长非光滑收缩SA的预极限耦合与稳态收敛
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yixuan Zhang;D. Huo;Yudong Chen;Qiaomin Xie
  • 通讯作者:
    Qiaomin Xie
Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption
具有重尾奖励和数据损坏的离线强化学习中的精确策略恢复
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yiding Chen;Xuezhou Zhang;Qiaomin Xie;Xiaojin Zhu;UW
  • 通讯作者:
    UW
Optimal Attack and Defense for Reinforcement Learning
强化学习的最优攻击和防御
  • DOI:
    10.1609/aaai.v38i13.29346
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jeremy McMahan;Young Wu;Xiaojin Zhu;Qiaomin Xie
  • 通讯作者:
    Qiaomin Xie

Qiaomin Xie的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Qiaomin Xie', 18)}}的其他基金

Travel: Student Travel Grant for the 2024 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
旅费:2024 年 ACM SIGMETRICS 国际计算机系统测量和建模会议学生旅费补助
  • 批准号:
    2412676
  • 财政年份:
    2024
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant

相似海外基金

eMB: Collaborative Research: Stochasticity in ovarian aging and biotechnologies for menopause delay
eMB:合作研究:卵巢衰老的随机性和延迟绝经的生物技术
  • 批准号:
    2325259
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
Collaborative Research: BoCP-Design: US-Sao Paulo: The roles of stochasticity and spatial context in dynamics of functional diversity under global change
合作研究:BoCP-设计:美国-圣保罗:随机性和空间背景在全球变化下功能多样性动态中的作用
  • 批准号:
    2225096
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
Collaborative Research: BoCP-Design: US-Sao Paulo: The roles of stochasticity and spatial context in dynamics of functional diversity under global change
合作研究:BoCP-设计:美国-圣保罗:随机性和空间背景在全球变化下功能多样性动态中的作用
  • 批准号:
    2225098
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
eMB: Collaborative Research: Stochasticity in ovarian aging and biotechnologies for menopause delay
eMB:合作研究:卵巢衰老的随机性和延迟绝经的生物技术
  • 批准号:
    2325258
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
Integrative analysis of the stochasticity of single-cell omics data for predicting pioneerness of transcription factors
单细胞组学数据随机性的综合分析用于预测转录因子的先驱性
  • 批准号:
    23K14165
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Quantifying the cognitive processes supporting computations of stochasticity and volatility in humans
量化支持人类随机性和波动性计算的认知过程
  • 批准号:
    10732422
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
Imaging metrics of neuronal stochasticity and brain resilience
神经元随机性和大脑弹性的成像指标
  • 批准号:
    489191
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Operating Grants
Building a synthetic chemical synapse through harnessed stochasticity
通过利用随机性构建合成化学突触
  • 批准号:
    DE230100684
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Discovery Early Career Researcher Award
Collaborative Research: BoCP-Design: US-Sao Paulo: The roles of stochasticity and spatial context in dynamics of functional diversity under global change
合作研究:BoCP-设计:美国-圣保罗:随机性和空间背景在全球变化下功能多样性动态中的作用
  • 批准号:
    2225097
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
Stochasticity in Approximate Dynamic Programming
近似动态规划中的随机性
  • 批准号:
    RGPIN-2020-04301
  • 财政年份:
    2022
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了