权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Reinforcement Learning-Based Control of Heterogeneous Multi-Agent Systems in Structured Environments: Algorithms and Complexity

职业：结构化环境中异构多智能体系统的基于强化学习的控制：算法和复杂性

基本信息

批准号：
2237830
负责人：
Yi Zhou
金额：
$ 54.1万
依托单位：
University of Utah
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-07-01 至 2028-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2237830&HistoricalAwards=false
关键词：
CAREER Reinforcement Learning Based Control

项目摘要

Reinforcement learning (RL) is a popular framework for learning optimal decision-making in complex environments, and many RL algorithms have been developed to improve decision-making of a single agent in normal environments. However, modern large-scale distributed learning applications usually involve multiple heterogeneous agents that interact with complex environments, making the optimal decision-making fundamentally more challenging to learn. For example, when navigating multiple drones in an open area, the drones need to properly cooperative with each other and take the environment uncertainty into account. As another example, in distributed wireless networks, the interaction of the agents (e.g., base stations or mobile phones) are subject to heterogeneous constraints on power and bandwidth, etc. This project aims to develop a resilient RL framework for managing heterogeneous multi-agent systems in complex environments, and systematically design efficient multi-agent RL algorithms with comprehensive convergence and complexity analysis. The project will produce RL algorithm packages that are fully accessible to the public. The research activities will also generate positive educational impacts on undergraduate and graduate students. The materials developed by this project will be integrated into courses on machine learning and optimization, and will benefit interdisciplinary students majoring in electrical and computer engineering, statistics and computer science. The project will actively involve underrepresented students and integrate research with education for undergraduate and graduate students in STEM. It will also produce introductory materials for K-12 students to be used in engineering summer research programs.The overarching goal of this project is to develop a resilient RL framework for managing multi-agent systems that involve heterogeneous agents in complex and structured environments, and systematically design scalable and computation-efficient RL algorithms with rigorous and comprehensive convergence and complexity analysis for managing such systems. The proposed research includes three major thrusts. First, to manage cooperative agents with heterogeneous constraints in various types of structured environments (e.g., homogeneity and uncertainty), the environment model structure will be leveraged to develop fully decentralized policy optimization algorithms with convergence and complexity analysis. Second, to manage competitive agents with heterogeneous constraints in uncertain environment, new tractable notions of constrained and robust equilibrium will be proposed. Their fundamental structures and properties will be studied, based on which fully-decentralized primal-dual type policy optimization algorithms and robust value-based algorithms with convergence guarantees will be developed. Lastly, to improve the generalizability of agents’ policies across heterogeneous environments, a new assistive RL framework that can substantially enhance the generalizability using few rounds of information exchange without data sharing will be developed. These RL algorithms will be applied to learn resilient and optimal control policies for interference management in wireless networks and energy control in power networks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习（RL）是一种在复杂环境中学习最优决策的流行框架，并且已经开发了许多RL算法来改善单个Agent在正常环境中的决策。然而，现代大规模分布式学习应用通常涉及多个异构代理，这些代理与复杂的环境交互，使得最优决策从根本上更具挑战性。例如，当在开放区域中导航多个无人机时，无人机需要适当地彼此协作并考虑环境的不确定性。作为另一示例，在分布式无线网络中，代理（例如，本项目旨在开发一个用于管理复杂环境中异构多智能体系统的弹性强化学习框架，并系统地设计具有全面收敛性和复杂性分析的高效多智能体强化学习算法。该项目将产生RL算法包，完全可供公众访问。研究活动还将对本科生和研究生产生积极的教育影响。该项目开发的材料将被整合到机器学习和优化课程中，并将使电气和计算机工程、统计学和计算机科学专业的跨学科学生受益。该项目将积极参与代表性不足的学生，并将研究与STEM本科生和研究生的教育相结合。该项目的总体目标是开发一个弹性强化学习框架，用于管理复杂和结构化环境中涉及异构代理的多代理系统，并系统地设计可扩展和计算高效的强化学习算法，并进行严格和全面的收敛和复杂性分析，以管理此类系统。拟议的研究包括三个主要方面。首先，为了在各种类型的结构化环境中管理具有异构约束的合作代理（例如，同质性和不确定性），将利用环境模型结构开发具有收敛性和复杂性分析的完全分散的策略优化算法。第二，为了在不确定环境中管理具有异质约束的竞争代理，将提出新的易处理的约束和鲁棒均衡的概念。研究了它们的基本结构和性质，并在此基础上提出了完全分散的原-对偶型策略优化算法和具有收敛保证的鲁棒值基算法。最后，为了提高代理的政策在异构环境中的通用性，一个新的辅助RL框架，可以大大提高通用性，使用几轮的信息交换，而无需数据共享将开发。这些RL算法将被应用于学习无线网络中干扰管理和电力网络中能量控制的弹性和最佳控制策略。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估而被认为值得支持。