权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Decentralized stochastic control of multi-agent teams: approximation, learning, and signaling

多智能体团队的去中心化随机控制：逼近、学习和信号发送

基本信息

批准号：
RGPIN-2021-03511
负责人：
Mahajan, Aditya
金额：
$ 3.35万
依托单位：
McGill University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=742593
关键词：
Decentralized stochastic control multi agent

项目摘要

We are moving towards an envisioned future where multiple interconnected autonomous agents will interact with humans in shared environments. Examples include self-driving cars, robotic assistants in homes, factory floors and warehouses, Industry 4.0 where automated control algorithms supervised by human operators control multiple interconnected industrial plants, and so on. A salient feature of such environments is that the agents have different information, yet they need to cooperate and coordinate their actions to achieve a common goal. The agents may have uncertainty about the system model and must be able to adapt to stochastic changes in the environment. The long term goal of the proposed research program is to develop theory and algorithms which address these salient features, provide a systematic methodology to design multiple agents operating in dynamic, stochastic, and uncertain environments and, thereby, enable the technologies of the future. The proposal maps a five year research program to pursue three research directions: (i) Approximation guarantees in decentralized control: Quantify the affect of model uncertainty and model approximation on the performance of decentralized systems. Use these to develop a solution framework which provides approximately optimal policy for hitherto unsolved information structures and apply it to networked control systems (ii) Learning with decentralized information: Develop multi-agent reinforcement learning (MARL) framework for decentralized learning and decentralized execution paradigm. Characterize the asymptotic optimality and regret of MARL algorithms. Identify trade-offs between speed of convergence and performance of the converged policies by restricting attention to policies with specific structure. Use this trade-off to investigate explainable and interpretable decision making in human-robot teams. (iii) Role of signaling in multi-agent systems: Characterize what and when to communicate over explicit communication channels when communication is costly and potentially the system model is unknown. Build on these results to characterize how to when and how to signal information via implicit communication. Determine the impact of implicit communication on explainable and interpretable decision making in human-robot teams. The proposed research program will provide a broad training to 5 PhD, 3 MEng, and 5 UG students in fundamental areas of Systems and Control and Reinforcement Learning, thereby providing them with a solid foundation to be at the forefront of innovation of a growing and transformative research field. The results will advance the state of knowledge in decentralized stochastic control and multi-agent reinforcement learning, and will contribute to the emergence of new technologies which will maintain Canada's position as an innovator in machine learning, energy, automotive, aerospace, and information technology sectors.

我们正在走向一个设想的未来，在那里，多个相互连接的自主代理将在共享环境中与人类交互。例如自动驾驶汽车、家庭、工厂车间和仓库中的机器人助理，以及由人工操作员监督的自动控制算法控制多个相互连接的工业工厂的Industry 4.0等等。这种环境的一个显著特征是，代理拥有不同的信息，但他们需要合作和协调他们的行动，以实现共同的目标。智能体可能对系统模型具有不确定性，并且必须能够适应环境的随机变化。提出的研究计划的长期目标是开发解决这些显著特征的理论和算法，提供一种系统的方法来设计在动态、随机和不确定环境中操作的多个代理，从而使未来的技术成为可能。该建议规划了一个五年的研究计划，以追求三个研究方向：(I)分散控制中的逼近保证：量化模型不确定性和模型逼近对分散系统性能的影响。利用这些来开发一个解决方案框架，为迄今尚未解决的信息结构提供近似最优的策略，并将其应用于网络控制系统(Ii)分散信息学习：开发用于分散学习和分散执行范例的多智能体强化学习(MAIL)框架。刻画了Marl算法的渐近最优性和遗憾。通过将注意力限制在具有特定结构的策略上，确定融合策略的收敛速度和性能之间的权衡。使用这种权衡来研究人类-机器人团队中可解释和可解释的决策。(Iii)信令在多智能体系统中的作用：当通信成本高昂且系统模型可能未知时，应确定在显式通信渠道上进行通信的内容和时间。在这些结果的基础上，描述如何通过隐式沟通发送信息的时间和方式。确定内隐沟通对人类-机器人团队中可解释和可解释决策的影响。拟议的研究计划将在系统、控制和强化学习的基础领域为5名博士生、3名孟加拉人和5名UG学生提供广泛的培训，从而为他们在一个不断增长和变革性的研究领域走在创新前沿奠定坚实的基础。研究结果将提高分散随机控制和多智能体强化学习的知识水平，并将有助于新技术的出现，这些新技术将保持加拿大在机器学习、能源、汽车、航空航天和信息技术领域的创新者地位。