Epistemic Uncertainty Estimation in Multi-Agent Reinforcement Learning
多智能体强化学习中的认知不确定性估计
基本信息
- 批准号:2747642
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project falls in the EPSRC artificial intelligence (AI) and robotics research area.Reinforcement Learning (RL) is a technique to train an AI agent through a system of rewards. The agent is allowed to interact and execute actions in an environment, and rewards are awarded when the agent successfully completes the intended task.RL has a multitude of applications, some examples include robotics, recommendations systems and healthcare.Multi-Agent Reinforcement Learning (MARL) focuses on how multiple agents interact with each other in a common environment.Each agent is motivated by their own rewards and interest, but agents can collaborate to achieve common goals or compete with each other, resulting in complex group dynamics.The study of MARL is increasingly more relevant, as AI agents become widespread in many aspects of our daily lives (e.g. self-driving cars).Aims and ObjectivesIn real-world scenarios, it is common for agents to not have perfect knowledge of the world around them, and modelling uncertainty is fundamental to avoid catastrophic and dangerous failures. This means, the agent should know what it does not know.The goal of this project is providing an accurate estimate of epistemic uncertainty in the MARL setting.Epistemic uncertainty refers to uncertainty caused by a lack of knowledge. In the MARL scenario, this includes information about the environment or other agents' motivations and behaviour. This type of uncertainty can be reduced by taking actions to explore the environment or interact with other agents.Obtaining a correct and calibrated uncertainty estimate could lead to safer interactions and collaboration between agents. Crucially, this includes interactions between AI agents and humans.A relevant application of this would be self-driving cars interacting with human drivers or other self-driving cars using different software. Modelling uncertainty over all other agents' behaviours, regardless of what software they run or what their intentions are, is fundamental for effective and safe collaboration.Novelty of the research methodologyCurrent RL techniques are incredibly successful, but fail to model uncertainty. In contrast, Bayesian models offer a theoretically grounded framework to reason about model uncertainty, but are often impossible to use in all but the simplest environments, due to their extremely high computational costs. Recently, multiple techniques have been proposed to circumvent this challenge and approximate Bayesian inference, such as dropout in Neural Networks (Gal, 2016) and Deep Ensembles (Lakshminarayanan, 2017).Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. PMLR, 2016.Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems 30 (2017).
该项目属于 EPSRC 人工智能 (AI) 和机器人研究领域。强化学习 (RL) 是一种通过奖励系统训练 AI 代理的技术。允许代理在环境中交互并执行操作,当代理成功完成预期任务时会授予奖励。强化学习有多种应用,包括机器人、推荐系统和医疗保健。多代理强化学习(MARL)重点关注多个代理如何在公共环境中相互交互。每个代理都受到自己的奖励和兴趣的激励,但代理可以协作以实现共同目标或相互竞争, 随着人工智能代理在我们日常生活的许多方面(例如自动驾驶汽车)变得广泛,MARL 的研究变得越来越重要。目的和目标在现实世界场景中,代理通常对周围的世界没有完美的了解,对不确定性进行建模是避免灾难性和危险故障的基础。这意味着,智能体应该知道它不知道什么。该项目的目标是在 MARL 设置中提供认知不确定性的准确估计。认知不确定性是指由于缺乏知识而引起的不确定性。在 MARL 场景中,这包括有关环境或其他代理的动机和行为的信息。通过采取行动探索环境或与其他智能体交互,可以减少这种类型的不确定性。获得正确且经过校准的不确定性估计可以导致智能体之间更安全的交互和协作。至关重要的是,这包括人工智能代理和人类之间的交互。其相关应用是自动驾驶汽车与人类驾驶员或使用不同软件的其他自动驾驶汽车进行交互。对所有其他代理行为的不确定性进行建模,无论他们运行什么软件或他们的意图是什么,都是有效和安全协作的基础。研究方法的新颖性当前的强化学习技术非常成功,但无法对不确定性进行建模。相比之下,贝叶斯模型提供了一个基于理论的框架来推理模型的不确定性,但由于其极高的计算成本,通常不可能在除了最简单的环境之外的所有环境中使用。最近,人们提出了多种技术来规避这一挑战并近似贝叶斯推理,例如神经网络中的 dropout(Gal,2016)和深度集成(Lakshminarayanan,2017)。Gal、Yarin 和 Zoubin Ghahramani。 “作为贝叶斯近似的 Dropout:代表深度学习中的模型不确定性。”机器学习国际会议。 PMLR,2016。Lakshminarayanan、Balaji、Alexander Pritzel 和 Charles Blundell。 “使用深度集成进行简单且可扩展的预测不确定性估计。”神经信息处理系统的进展 30 (2017)。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似海外基金
AMPS: Scalable Methods for Real-time Estimation of Power Systems under Uncertainty
AMPS:不确定性下电力系统实时估计的可扩展方法
- 批准号:
2229495 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: New Foundations for Multi-Fidelity Prediction, Estimation, and Learning Under Uncertainty in Dynamical Systems
职业生涯:动态系统不确定性下多保真度预测、估计和学习的新基础
- 批准号:
2238913 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Development of high-precision estimation method of uncertainty in Bayesian structure inverse analysis
贝叶斯结构逆分析中不确定性高精度估计方法的研制
- 批准号:
22H01579 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Optimal estimation and uncertainty quantification for velocimetry-based pressure field reconstruction
基于测速的压力场重建的最优估计和不确定性量化
- 批准号:
RGPIN-2020-04486 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
High-resolution prediction and uncertainty estimation of land value and conservation costs
土地价值和保护成本的高分辨率预测和不确定性估计
- 批准号:
2149243 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Automated organ segmentation in 3D medical images: Is uncertainty estimation by artificial intelligence useful for improving accuracy?
3D 医学图像中的自动器官分割:人工智能的不确定性估计是否有助于提高准确性?
- 批准号:
21K07674 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
State Estimation Algorithms on Computer Architectures that Track Uncertainty
跟踪不确定性的计算机体系结构的状态估计算法
- 批准号:
2597692 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Studentship
Optimal estimation and uncertainty quantification for velocimetry-based pressure field reconstruction
基于测速的压力场重建的最优估计和不确定性量化
- 批准号:
RGPIN-2020-04486 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Improved Uncertainty Estimation in Misspecified Models
改进错误指定模型中的不确定性估计
- 批准号:
553056-2020 - 财政年份:2020
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Disaster estimation of wind gust considering turbulence structure and uncertainty of peak wind velocity in urban boundary layer for extreme weather event
极端天气事件下考虑湍流结构和城市边界层峰值风速不确定性的阵风灾害估计
- 批准号:
20K14869 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Early-Career Scientists














{{item.name}}会员




