权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Decentralized stochastic control of multi-agent teams: approximation, learning, and signaling

多智能体团队的去中心化随机控制：逼近、学习和信号发送

基本信息

批准号：
RGPIN-2021-03511
负责人：
Mahajan, Aditya
金额：
$ 3.35万
依托单位：
McGill University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=752186
关键词：
Decentralized stochastic control multi agent

项目摘要

We are moving towards an envisioned future where multiple interconnected autonomous agents will interact with humans in shared environments. Examples include self-driving cars, robotic assistants in homes, factory floors and warehouses, Industry 4.0 where automated control algorithms supervised by human operators control multiple interconnected industrial plants, and so on. A salient feature of such environments is that the agents have different information, yet they need to cooperate and coordinate their actions to achieve a common goal. The agents may have uncertainty about the system model and must be able to adapt to stochastic changes in the environment. The long term goal of the proposed research program is to develop theory and algorithms which address these salient features, provide a systematic methodology to design multiple agents operating in dynamic, stochastic, and uncertain environments and, thereby, enable the technologies of the future. The proposal maps a five year research program to pursue three research directions: (i) Approximation guarantees in decentralized control: Quantify the affect of model uncertainty and model approximation on the performance of decentralized systems. Use these to develop a solution framework which provides approximately optimal policy for hitherto unsolved information structures and apply it to networked control systems (ii) Learning with decentralized information: Develop multi-agent reinforcement learning (MARL) framework for decentralized learning and decentralized execution paradigm. Characterize the asymptotic optimality and regret of MARL algorithms. Identify trade-offs between speed of convergence and performance of the converged policies by restricting attention to policies with specific structure. Use this trade-off to investigate explainable and interpretable decision making in human-robot teams. (iii) Role of signaling in multi-agent systems: Characterize what and when to communicate over explicit communication channels when communication is costly and potentially the system model is unknown. Build on these results to characterize how to when and how to signal information via implicit communication. Determine the impact of implicit communication on explainable and interpretable decision making in human-robot teams. The proposed research program will provide a broad training to 5 PhD, 3 MEng, and 5 UG students in fundamental areas of Systems and Control and Reinforcement Learning, thereby providing them with a solid foundation to be at the forefront of innovation of a growing and transformative research field. The results will advance the state of knowledge in decentralized stochastic control and multi-agent reinforcement learning, and will contribute to the emergence of new technologies which will maintain Canada's position as an innovator in machine learning, energy, automotive, aerospace, and information technology sectors.

我们正朝着一个设想的未来迈进，在这个未来，多个相互连接的自主智能体将在共享的环境中与人类互动。例如，自动驾驶汽车，家庭、工厂车间和仓库中的机器人助手，以及由人类操作员监督的自动控制算法控制多个互联工业工厂的工业4.0等等。这些环境的一个显著特征是，智能体拥有不同的信息，但它们需要合作和协调行动以实现共同的目标。代理可能对系统模型具有不确定性，并且必须能够适应环境中的随机变化。拟议的研究计划的长期目标是开发理论和算法，解决这些突出的特点，提供一个系统的方法来设计多个代理在动态，随机和不确定的环境中运行，从而使未来的技术。该提案描绘了一个为期五年的研究计划，以追求三个研究方向：（i）分散控制中的近似保证：量化模型不确定性和模型近似对分散系统性能的影响。使用这些开发一个解决方案框架，提供近似最优的政策，迄今未解决的信息结构，并将其应用到网络控制系统（ii）学习与分散的信息：开发多代理强化学习（MARL）框架分散学习和分散执行范式。刻画MARL算法的渐近最优性和遗憾性。通过限制对具有特定结构的策略的关注，确定收敛速度与收敛策略性能之间的权衡。使用这种权衡来研究人类-机器人团队中可解释和可解释的决策。(iii)信令在多代理系统中的作用：当通信代价高昂且系统模型可能未知时，描述在显式通信信道上通信的内容和时间。在这些结果的基础上，来描述如何、何时以及如何通过隐式沟通来传递信息。确定隐式通信对人类-机器人团队中可解释和可解释决策的影响。拟议的研究计划将为5名博士，3名工程硕士和5名UG学生提供系统和控制以及强化学习的基本领域的广泛培训，从而为他们提供坚实的基础，使他们处于不断增长和变革的研究领域的创新前沿。研究结果将推进分散随机控制和多智能体强化学习的知识状态，并将有助于新技术的出现，这将保持加拿大作为机器学习，能源，汽车，航空航天和信息技术领域的创新者的地位。