权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Stochasticity and Resilience in Reinforcement Learning: From Single to Multiple Agents

职业：强化学习中的随机性和弹性：从单个智能体到多个智能体

基本信息

批准号：
2339794
负责人：
Qiaomin Xie
金额：
$ 53.29万
依托单位：
University of Wisconsin-Madison
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-03-01 至 2029-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339794&HistoricalAwards=false
关键词：
CAREER Stochasticity Resilience Reinforcement Learning

项目摘要

Reinforcement Learning (RL) has emerged as a promising data-driven paradigm for learning to control unknown and complex systems. It has achieved impressive success in simulated environments such as games. However, for applications in real-world engineering systems, existing RL algorithms and theory fall short of addressing three fundamental challenges: high stochasticity, long-horizon regimes and vulnerability to model uncertainty. These challenges are exacerbated in systems with multiple strategic agents. The goal of this CAREER project is to advance the algorithmic and theoretical foundations of RL by addressing these challenges, and enable efficient and resilient RL-based control in engineering systems. This project will particularly focus on applications in computer and communication networks, which will guide the problem formulation, methodology development and evaluation. The project is enhanced by an education plan that aims to offer students from K–12 to college a pathway to obtain experience and training in RL and broadly machine learning, as well as in their applications in engineering systems. This project will also support a mentoring program for students fromunderrepresented groups in STEM.The research work in this project will address the aforementioned challenges via three technical thrusts. Thrust 1 studies finite-time convergence of various iterative algorithms that arise in RL through the unified variational inequality framework, by leveraging tools from modern Markov chain theory. In Thrust 2, we will develop techniques to tame the high stochasticity in long-horizon problems, and further develop RL algorithms that provably learn a stable and near-optimal policy. Thrust 3 studies scalable multi-agent RL through the framework of mean-field game and graphon game, as well as the game theoretical foundation of robust Markov games under model uncertainty. The developed RL algorithms will be implemented and evaluated in a broad profile of decision-making problems in computer and communication networks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习（RL）已经成为一种有前途的数据驱动的学习范式，用于控制未知和复杂的系统。它在游戏等模拟环境中取得了令人印象深刻的成功。然而，对于实际工程系统的应用，现有的强化学习算法和理论无法解决三个基本挑战：高随机性、长视界制度和易受模型不确定性影响。这些挑战在具有多个战略代理的系统中更加严重。CAREER项目的目标是通过解决这些挑战来推进强化学习的算法和理论基础，并在工程系统中实现高效和弹性的基于强化学习的控制。这个项目将特别着重于计算机和通信网络的应用，这将指导问题的拟订、方法的发展和评价。该项目通过一项教育计划得到加强，该计划旨在为从K-12到大学的学生提供获得强化学习和广泛机器学习经验和培训的途径，以及它们在工程系统中的应用。该项目还将支持一项针对STEM中代表性不足群体的学生的指导计划。该项目的研究工作将通过三个技术重点来解决上述挑战。推力1通过统一的变分不等式框架，利用现代马尔可夫链理论的工具，研究RL中出现的各种迭代算法的有限时间收敛性。在推力2中，我们将开发技术来驯服长视界问题中的高随机性，并进一步开发可证明学习稳定和接近最优策略的强化学习算法。Thrust 3通过平均场博弈和graphon博弈的框架研究可扩展多智能体强化学习，以及模型不确定性下鲁棒马尔可夫博弈的博弈理论基础。开发的强化学习算法将在计算机和通信网络的决策问题的广泛概况中实施和评估。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。