Conditions and methods for Decentralised Reinforcement Learning
去中心化强化学习的条件和方法
基本信息
- 批准号:2619847
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2021
- 资助国家:英国
- 起止时间:2021 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Advances in Reinforcement learning (RL) in the last decade have made it a hot topic for research. Improvements in hardware performance and the combination of RL with the use of neural networks have allowed for the development of algorithms that achieve state-of-the-art performance in many control problems, including computer games in which they beat human champions. Some open questions that remain in the field however are how to learn in more complex environments, how to learn more efficiently from limited samples and how to learn for more general tasks. One approach used to learn in very complex environments is to decentralise the control task to multiple agents, rather than a single centralised one. This can greatly reduce the complexity of learning by each agent with the possible expense of more limited policies (action plans) that can be enacted by the group of agents and other technical issues affecting the stability of the training process. The decentralisation occurs quite naturally in many scenarios, such as in self-driving cars, in which each agent can be one car or in a resource assignment task in a cluster of computers, in which each agent could control the tasks assigned to each computer. These decentralised agents can have various levels of communication and synchronisation with each other, which affects the size of the set of possible policies to be taken by the agents. My research aims to deal with agents that communicate implicitly, meaning that they do not directly share their status (state information) with each other, however they observe common features of the environment that allow to collect information about the status of the other agents. The first question that I aim to answer is what are the scenarios in which such a decentralised RL system can achieve the same level of performance as a centralised single agent? This involves setting mathematical conditions on the states, the rewards, and the policy. This is done by modelling the decentralised solution as a decentralised partially observable Markov decision process (dec-POMDP), which allows to consider the decentralisation of the agents and the partial observability of the environment from each agent. Then, I want to investigate in more general scenarios, what is the effect of applying decentralisation? Can I derive theoretical bounds on the performance loss due to decentralisation under certain conditions? Are there special conditions under which decentralisation is especially useful? Subsequently, I want to use these conditions to develop an algorithm that can easily distinguish between tasks that are decentralizable and those that are not. Depending on what the mathematical conditions are, this may be easily done directly using the derived formula, but it could also involve massive computation. In this case, it would be useful to create approximations that would allow to easily test how decentralising the solution of the problem task affect the theoretical performance bounds.While there has been much existing research about decentralised RL algorithms trying to achieve the maximum performance in every kind of scenario, the relationship between centralised and decentralised RL solutions has not been explored in depth. My PhD research aims to provide a theoretical foundation about this relationship and aims to provide novel tools in the form of algorithms that would allow the designer of a decentralised solution to know the maximum theoretical performance that a certain design of a decentralised solution can achieve. This research has the potential to be applied to the control of many systems having components that require cooperating behaviour to achieve the optimal performance. Examples of such systems can be found in self-driving cars, robotics, communication networks, etc. My research is aligned with the ESPRC field "Artificial Intelligence technologies" and "ICT networks and distributed systems".
强化学习(RL)在过去十年中的进展使其成为研究的热门话题。硬件性能的改进以及RL与神经网络的结合使得算法的开发能够在许多控制问题中实现最先进的性能,包括击败人类冠军的计算机游戏。然而,该领域仍然存在一些悬而未决的问题,即如何在更复杂的环境中学习,如何从有限的样本中更有效地学习,以及如何学习更一般的任务。在非常复杂的环境中学习的一种方法是将控制任务分散到多个代理,而不是一个集中的代理。这可以大大降低每个代理学习的复杂性,但可能会导致代理组制定的政策(行动计划)更加有限,以及其他影响培训过程稳定性的技术问题。去中心化在许多场景中非常自然地发生,例如在自动驾驶汽车中,每个代理可以是一辆汽车,或者在计算机集群中的资源分配任务中,每个代理可以控制分配给每个计算机的任务。这些分散的代理可以有各种级别的通信和相互同步,这会影响代理所采取的可能策略集的大小。我的研究旨在处理隐式通信的代理,这意味着他们不直接相互分享他们的状态(状态信息),但是他们观察到环境的共同特征,允许收集有关其他代理状态的信息。我想回答的第一个问题是,在哪些情况下,这种去中心化的RL系统可以实现与集中式单个代理相同的性能水平?这涉及到对状态、奖励和策略设置数学条件。这是通过将分散的解决方案建模为分散的部分可观测马尔可夫决策过程(dec-POMDP)来完成的,该过程允许考虑代理的分散性和每个代理的环境的部分可观测性。然后,我想在更一般的情况下调查,应用分散化的效果是什么?我能推导出在一定条件下由于分散化而导致的性能损失的理论界限吗?是否存在分权特别有用的特殊条件?随后,我想使用这些条件来开发一种算法,该算法可以轻松区分可分散的任务和不可分散的任务。根据数学条件的不同,这可能很容易直接使用导出的公式来完成,但也可能涉及大量的计算。在这种情况下,创建近似值将是有用的,这将允许轻松地测试如何分散的问题任务的解决方案影响理论性能bounds.While已经有很多现有的研究分散RL算法试图在每种情况下实现最大性能,集中和分散RL解决方案之间的关系还没有深入探讨。我的博士研究旨在提供关于这种关系的理论基础,并旨在以算法的形式提供新的工具,使分散式解决方案的设计者能够了解分散式解决方案的某种设计可以实现的最大理论性能。这项研究有可能被应用到许多系统的控制,需要合作的行为,以实现最佳性能的组件。这种系统的例子可以在自动驾驶汽车,机器人,通信网络等中找到。我的研究与ESPRC领域“人工智能技术”和“ICT网络和分布式系统”一致。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
复杂图像处理中的自由非连续问题及其水平集方法研究
- 批准号:60872130
- 批准年份:2008
- 资助金额:28.0 万元
- 项目类别:面上项目
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Impact of Urban Environmental Factors on Momentary Subjective Wellbeing (SWB) using Smartphone-Based Experience Sampling Methods
使用基于智能手机的体验采样方法研究城市环境因素对瞬时主观幸福感 (SWB) 的影响
- 批准号:
2750689 - 财政年份:2025
- 资助金额:
-- - 项目类别:
Studentship
Developing behavioural methods to assess pain in horses
开发评估马疼痛的行为方法
- 批准号:
2686844 - 财政年份:2025
- 资助金额:
-- - 项目类别:
Studentship
Population genomic methods for modelling bacterial pathogen evolution
用于模拟细菌病原体进化的群体基因组方法
- 批准号:
DE240100316 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Discovery Early Career Researcher Award
Development and Translation Mass Spectrometry Methods to Determine BioMarkers for Parkinson's Disease and Comorbidities
确定帕金森病和合并症生物标志物的质谱方法的开发和转化
- 批准号:
2907463 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Studentship
Non invasive methods to accelerate the development of injectable therapeutic depots
非侵入性方法加速注射治疗储库的开发
- 批准号:
EP/Z532976/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
Spectral embedding methods and subsequent inference tasks on dynamic multiplex graphs
动态多路复用图上的谱嵌入方法和后续推理任务
- 批准号:
EP/Y002113/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
CAREER: Nonlinear Dynamics of Exciton-Polarons in Two-Dimensional Metal Halides Probed by Quantum-Optical Methods
职业:通过量子光学方法探测二维金属卤化物中激子极化子的非线性动力学
- 批准号:
2338663 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
Conference: North American High Order Methods Con (NAHOMCon)
会议:北美高阶方法大会 (NAHOMCon)
- 批准号:
2333724 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
REU Site: Computational Methods with applications in Materials Science
REU 网站:计算方法及其在材料科学中的应用
- 批准号:
2348712 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant