权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exploring Causality in Reinforcement Learning for Robust Decision Making

探索强化学习中的因果关系以实现稳健决策

基本信息

批准号：
EP/Y003187/1
负责人：
Yali Du
金额：
$ 20.97万
依托单位：
King's College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FY003187%2F1
关键词：
Exploring Causality Reinforcement Learning Robust

项目摘要

Reinforcement learning (RL) has seen significant development in recent years and has demonstrated impressive capabilities in decision-making tasks, such as games (AlphaStar, OpenAI Five), chatbots (ChatGPT, and recommendation systems (Microsoft). The techniques of RL can also be applied to many fields, such as transportation, network communications, autonomous driving, sequential treatment in healthcare, robotics, and control. Unlike traditional supervised learning, RL focuses on making a sequence of decisions to achieve a long-term goal. This makes it particularly well-suited for solving complex problems. However, while RL has the potential to be highly effective, there are challenges that need to be addressed in order to make it more practical for real-world applications, where changing factors cannot be fully considered in training the agent, such as traffic regulations, weather, and clouds. To empower RL algorithms to be deployed in a range of real applications, we need to evaluate and improve the robustness of RL when facing complex changes in the real world and task shifts.In this project, we aim to develop robust and generalisable reinforcement learning techniques from a causal modelling perspective. The first thrust focuses on utilising causal model learning to create compact and robust representations of tasks. This compact and robust task representation can greatly benefit the overall performance of the RL agent by reducing the complexity of the problem and making the agent's decision-making process more efficient. As a result, the agent can learn faster and generalise better to unseen tasks, which is especially important in real-world scenarios where data is scarce and the complexity of tasks can vary greatly.The second research thrust focuses on the development of efficient and generalisable algorithms for task assignment transfer. This can enable the RL agent to adapt to new tasks more quickly and effectively and to generalise the learned knowledge to different but related tasks. This is crucial for real-world scenarios where the agent needs to operate in different environments or the task requirements change over time.One example of an application that would benefit from these contributions is autonomous driving in an industrial setting. While RL agents are usually trained in simulators, they may not perform well in real-world road scenarios and can be easily distracted by task-irrelevant information. For example, visual images that autonomous cars observe contain predominantly task-irrelevant information, like cloud shapes and architectural details, which should not influence the decision on driving.In this project, we aim to enable the agent to learn a compact and robust representation of the task, enabling it to only retain state information that is relevant to the task, adapt to changing driving scenarios safely, and generalise its knowledge to related tasks such as adapting to the different driving rules in the United States (right-hand drive).A causal understanding can help identify the minimal sufficient representations that are essential for policy learning and transferring and achieve safe and controllable explorations by leveraging causal structures and counterfactual reasoning.It can mitigate the issues that are suffered by most existing RL approaches, such as being data-hungry and lacking interpretability and generalisability.The outcome of this project can greatly improve the scalability and adaptability of RL agents, making them more suitable for real-world applications.

强化学习（RL）近年来取得了重大发展，并在决策任务中表现出令人印象深刻的能力，例如游戏（AlphaStar，OpenAI Five），聊天机器人（ChatGPT）和推荐系统（Microsoft）。RL的技术也可以应用于许多领域，例如交通运输，网络通信，自动驾驶，医疗保健中的顺序治疗，机器人和控制。与传统的监督学习不同，RL专注于做出一系列决策以实现长期目标。这使得它特别适合解决复杂的问题。然而，尽管RL具有高效的潜力，但为了使其在现实世界的应用中更加实用，需要解决一些挑战，因为在训练智能体时无法充分考虑不断变化的因素，例如交通规则，天气和云。为了使强化学习算法能够部署在一系列真实的应用中，我们需要评估和提高强化学习在面对真实的世界和任务转移的复杂变化时的鲁棒性。在这个项目中，我们的目标是从因果建模的角度开发鲁棒的和可推广的强化学习技术。第一个重点是利用因果模型学习来创建紧凑和强大的任务表示。这种紧凑和强大的任务表示可以通过降低问题的复杂性和使代理的决策过程更有效，大大有利于RL代理的整体性能。因此，智能体可以更快地学习，更好地概括看不见的任务，这是特别重要的，在现实世界中的情况下，数据是稀缺的，任务的复杂性可以有很大的差异。第二个研究重点是开发高效和可推广的算法任务分配转移。这可以使RL代理能够更快速有效地适应新任务，并将学习到的知识推广到不同但相关的任务。这对于智能体需要在不同环境中运行或任务要求随时间变化的真实场景至关重要。工业环境中的自动驾驶就是一个受益于这些贡献的应用示例。虽然RL代理通常在模拟器中进行训练，但它们在现实世界的道路场景中可能表现不佳，并且很容易被与任务无关的信息分散注意力。例如，自动汽车观察到的视觉图像主要包含与任务无关的信息，如云的形状和建筑细节，这些信息不应该影响驾驶决策。在这个项目中，我们的目标是使智能体能够学习任务的紧凑和鲁棒的表示，使其能够只保留与任务相关的状态信息，安全地适应不断变化的驾驶场景，并将其知识推广到相关任务，例如适应美国不同的驾驶规则（右舵驾驶）。因果理解可以帮助识别政策学习和转移所必需的最小充分表征，并通过利用因果结构和反事实推理实现安全可控的探索。它可以缓解针对目前大多数强化学习方法存在的数据量大、缺乏可解释性和可推广性等问题，本项目的研究成果可以大大提高强化学习智能体的可扩展性和适应性，使其更适合于实际应用。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yali Du其他文献

Persistence of severe global inequalities in the burden of blindness and vision loss from 1990 to 2019: findings from the Global Burden of Disease Study 2019

1990 年至 2019 年全球失明和视力丧失负担方面持续存在严重不平等：2019 年全球疾病负担研究的结果

DOI：
发表时间：
2023
期刊：
British Journal of Ophthalmology
影响因子：
4.1
作者：
Yuancun Li;Hongxi Wang;Zhiqiang Guan;Cheng;P. Guo;Yali Du;Shengjie Yin;Binyao Chen;Jiao Jiang;Yueting Ma;Liu Jing;Yingzi Huang;Ke Zheng;Qian Ma;Ruiqing Zhou;Min Chen;N. Congdon;K. Qiu;Mingzhi Zhang
通讯作者：
Mingzhi Zhang