权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Partially Observable Multi-agent Inverse Reinforcement Learning

部分可观察多智能体逆强化学习

基本信息

批准号：
2894217
负责人：
金额：
--
依托单位：
Imperial College London
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2894217
关键词：
Partially Observable Multi agent Inverse

项目摘要

A major challenge of reinforcement learning is approximating the reward structure, which inverse reinforcement learning addresses by learning the reward structure from expert demonstrations. This method has shown a lot of success, however, few literature exists that focuses on inverse reinforcement learning methods for multi-agent systems, although the use of multi-agent systems is on the rise. Current inverse reinforcement learning methods for multi-agent systems model the systems in a fully observable environment, which can be impractical to use in a real world environment. In a partially observable environment, we assume that data that is observed can be noisy or even incorrect, which better models real systems. To address this knowledge gap, the main research question that will be addressed by this research is what algorithms can be used for environments that can be modelled as multi-agent partially observable Markov decision processes? To extend this question, we will evaluate the performance of these algorithms in simulated and practical environments. To solve these questions, we will first design algorithms using the theoretical understanding of the problem. Then, we will apply the algorithms to simulated games, and evaluate the performance against other algorithms. These games will be standard benchmark games for partially observable multi-agent reinforcement learning. The same will be done with a practical example using real data. The goal of using real world data is to see how accurately we can model human behaviour in a multi-agent setting. Finally, we will explore how the use of multi-agent inverse reinforcement learning to better improve traditional multi-agent reinforcement learning methods by having a more systematic way of modelling reward structures for these problems. These new methods will similarly be evaluated on simulated and practical environments. By studying multi-agent reinforcement learning in a partially observable environment, it allows us to much better apply inverse reinforcement learning to real world systems. One of the main potential benefits lies in the development of human compatible AI systems. Whenever a human interacts with an AI, we can classify that as a multi-agent system. In order to appropriately design the AI system for the benefit of the human, we need to model the human behaviours and needs accurately, which can be done with inverse reinforcement learning. Another important application is the estimation of human needs. In economics, psychology, and engineering, it is important to understand what humans need in order to design systems that address those needs. This research will allow us to accurately model these human behaviours in realistic scenarios in the presence of other intelligent beings. Additionally, by using these estimations and models, we can then also design better engineering systems by simulating human behaviour and test humans will interact with the new systems. This research will align with the EPSRC engineering and artificial intelligence research areas. One of the main factors for integrating artificial intelligence in engineered systems, such as future smart cities, is the ability for humans to interact intelligently with such systems. This research will directly address that by having a more accurate way of modelling human behaviour and being able to train artificially intelligent agents to behave like humans. In this way, we can design systems that will address the human needs in future engineering systems.

强化学习的一个主要挑战是近似奖励结构，逆强化学习通过从专家演示中学习奖励结构来解决这个问题。这种方法已经取得了很大的成功，然而，很少有文献关注多智能体系统的逆强化学习方法，尽管多智能体系统的使用正在增加。当前的多智能体系统的逆强化学习方法在完全可观察的环境中对系统进行建模，这对于在真实的世界环境中使用是不切实际的。在部分可观察的环境中，我们假设观察到的数据可能是有噪声的，甚至是不正确的，这更好地模拟了真实的系统。为了解决这一知识差距，本研究将解决的主要研究问题是什么算法可以用于环境中，可以建模为多代理部分可观察马尔可夫决策过程？为了扩展这个问题，我们将评估这些算法在模拟和实际环境中的性能。为了解决这些问题，我们将首先使用对问题的理论理解来设计算法。然后，我们将算法应用于模拟游戏，并与其他算法的性能进行评估。这些游戏将成为部分可观察多智能体强化学习的标准基准游戏。同样的事情也将在一个使用真实的数据的实际例子中完成。使用真实的世界数据的目的是看看我们在多智能体环境中对人类行为建模的准确程度。最后，我们将探索如何使用多智能体逆强化学习来更好地改进传统的多智能体强化学习方法，通过更系统的方式为这些问题建模奖励结构。这些新方法将同样在模拟和实际环境中进行评估。通过在部分可观察环境中研究多智能体强化学习，它使我们能够更好地将逆强化学习应用于真实的世界系统。其中一个主要的潜在好处在于开发与人类兼容的人工智能系统。每当人类与AI交互时，我们可以将其归类为多智能体系统。为了适当地设计人工智能系统以造福人类，我们需要准确地对人类的行为和需求进行建模，这可以通过逆强化学习来完成。另一个重要的应用是对人类需求的估计。在经济学、心理学和工程学中，了解人类需要什么，以便设计满足这些需求的系统是很重要的。这项研究将使我们能够在有其他智能生物存在的现实场景中准确地模拟这些人类行为。此外，通过使用这些估计和模型，我们还可以通过模拟人类行为来设计更好的工程系统，并测试人类将与新系统进行交互。这项研究将与EPSRC工程和人工智能研究领域保持一致。将人工智能集成到工程系统（如未来的智慧城市）中的主要因素之一是人类与此类系统进行智能交互的能力。这项研究将通过更准确地模拟人类行为并能够训练人工智能代理像人类一样行事来直接解决这个问题。通过这种方式，我们可以设计出满足未来工程系统中人类需求的系统。