Partially Observable Multi-agent Inverse Reinforcement Learning
部分可观察多智能体逆强化学习
基本信息
- 批准号:2894217
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
A major challenge of reinforcement learning is approximating the reward structure, which inverse reinforcement learning addresses by learning the reward structure from expert demonstrations. This method has shown a lot of success, however, few literature exists that focuses on inverse reinforcement learning methods for multi-agent systems, although the use of multi-agent systems is on the rise. Current inverse reinforcement learning methods for multi-agent systems model the systems in a fully observable environment, which can be impractical to use in a real world environment. In a partially observable environment, we assume that data that is observed can be noisy or even incorrect, which better models real systems. To address this knowledge gap, the main research question that will be addressed by this research is what algorithms can be used for environments that can be modelled as multi-agent partially observable Markov decision processes? To extend this question, we will evaluate the performance of these algorithms in simulated and practical environments. To solve these questions, we will first design algorithms using the theoretical understanding of the problem. Then, we will apply the algorithms to simulated games, and evaluate the performance against other algorithms. These games will be standard benchmark games for partially observable multi-agent reinforcement learning. The same will be done with a practical example using real data. The goal of using real world data is to see how accurately we can model human behaviour in a multi-agent setting. Finally, we will explore how the use of multi-agent inverse reinforcement learning to better improve traditional multi-agent reinforcement learning methods by having a more systematic way of modelling reward structures for these problems. These new methods will similarly be evaluated on simulated and practical environments. By studying multi-agent reinforcement learning in a partially observable environment, it allows us to much better apply inverse reinforcement learning to real world systems. One of the main potential benefits lies in the development of human compatible AI systems. Whenever a human interacts with an AI, we can classify that as a multi-agent system. In order to appropriately design the AI system for the benefit of the human, we need to model the human behaviours and needs accurately, which can be done with inverse reinforcement learning. Another important application is the estimation of human needs. In economics, psychology, and engineering, it is important to understand what humans need in order to design systems that address those needs. This research will allow us to accurately model these human behaviours in realistic scenarios in the presence of other intelligent beings. Additionally, by using these estimations and models, we can then also design better engineering systems by simulating human behaviour and test humans will interact with the new systems. This research will align with the EPSRC engineering and artificial intelligence research areas. One of the main factors for integrating artificial intelligence in engineered systems, such as future smart cities, is the ability for humans to interact intelligently with such systems. This research will directly address that by having a more accurate way of modelling human behaviour and being able to train artificially intelligent agents to behave like humans. In this way, we can design systems that will address the human needs in future engineering systems.
强化学习的一个主要挑战是近似奖励结构,逆强化学习通过从专家演示中学习奖励结构来解决这个问题。这种方法已经取得了很大的成功,然而,很少有文献关注多智能体系统的逆强化学习方法,尽管多智能体系统的使用正在增加。当前的多智能体系统的逆强化学习方法在完全可观察的环境中对系统进行建模,这对于在真实的世界环境中使用是不切实际的。在部分可观察的环境中,我们假设观察到的数据可能是有噪声的,甚至是不正确的,这更好地模拟了真实的系统。为了解决这一知识差距,本研究将解决的主要研究问题是什么算法可以用于环境中,可以建模为多代理部分可观察马尔可夫决策过程?为了扩展这个问题,我们将评估这些算法在模拟和实际环境中的性能。为了解决这些问题,我们将首先使用对问题的理论理解来设计算法。然后,我们将算法应用于模拟游戏,并与其他算法的性能进行评估。这些游戏将成为部分可观察多智能体强化学习的标准基准游戏。同样的事情也将在一个使用真实的数据的实际例子中完成。使用真实的世界数据的目的是看看我们在多智能体环境中对人类行为建模的准确程度。最后,我们将探索如何使用多智能体逆强化学习来更好地改进传统的多智能体强化学习方法,通过更系统的方式为这些问题建模奖励结构。这些新方法将同样在模拟和实际环境中进行评估。通过在部分可观察环境中研究多智能体强化学习,它使我们能够更好地将逆强化学习应用于真实的世界系统。其中一个主要的潜在好处在于开发与人类兼容的人工智能系统。每当人类与AI交互时,我们可以将其归类为多智能体系统。为了适当地设计人工智能系统以造福人类,我们需要准确地对人类的行为和需求进行建模,这可以通过逆强化学习来完成。另一个重要的应用是对人类需求的估计。在经济学、心理学和工程学中,了解人类需要什么,以便设计满足这些需求的系统是很重要的。这项研究将使我们能够在有其他智能生物存在的现实场景中准确地模拟这些人类行为。此外,通过使用这些估计和模型,我们还可以通过模拟人类行为来设计更好的工程系统,并测试人类将与新系统进行交互。这项研究将与EPSRC工程和人工智能研究领域保持一致。将人工智能集成到工程系统(如未来的智慧城市)中的主要因素之一是人类与此类系统进行智能交互的能力。这项研究将通过更准确地模拟人类行为并能够训练人工智能代理像人类一样行事来直接解决这个问题。通过这种方式,我们可以设计出满足未来工程系统中人类需求的系统。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似海外基金
CAREER: Structural Estimation and Optimization for Partially Observable Markov Decision Processes and Markov Games
职业:部分可观察马尔可夫决策过程和马尔可夫博弈的结构估计和优化
- 批准号:
2236477 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
CIF: SMALL: Theoretical Foundations of Partially Observable Reinforcement Learning: Minimax Sample Complexity and Provably Efficient Algorithms
CIF:SMALL:部分可观察强化学习的理论基础:最小最大样本复杂性和可证明有效的算法
- 批准号:
2315725 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Interacting observable and measurement in Quantum Field Theory
量子场论中可观测量与测量的相互作用
- 批准号:
2885338 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship
FRR: Semi-Structured, Under-Specified, Partially-Observable Robotic Rearrangement
FRR:半结构化、未指定、部分可观察的机器人重排
- 批准号:
2309866 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Observable signatures of learning in neural circuits
神经回路中学习的可观察特征
- 批准号:
RGPIN-2019-06379 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
The use of memory mechanism for long horizon, partially observable credit assignment in reinforcement learning
在强化学习中使用记忆机制进行长期、部分可观察的学分分配
- 批准号:
559278-2021 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Postgraduate Scholarships - Doctoral
Dynamical systems with observable Lyapunov irregular sets
具有可观测李亚普诺夫不规则集的动力系统
- 批准号:
22K03342 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Reinforcement Learning in Partially Observable Environments
部分可观察环境中的强化学习
- 批准号:
572486-2022 - 财政年份:2022
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Partially Observable Risk-Averse Control Systems and Extensions
部分可观察的风险规避控制系统和扩展
- 批准号:
572633-2022 - 财政年份:2022
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Career: IIS: RI: Improving Multi-Agent Reinforcement Learning for Cooperative, Partially Observable Settings
职业:IIS:RI:改进合作、部分可观察设置的多智能体强化学习
- 批准号:
2044993 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Continuing Grant