权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

An Abstraction-based Technique for Safe Reinforcement Learning

一种基于抽象的安全强化学习技术

基本信息

批准号：
EP/X015823/1
负责人：
Francesco Belardinelli
金额：
$ 38.49万
依托单位：
Imperial College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX015823%2F1
关键词：
Abstraction based Technique Safe Reinforcement

项目摘要

Autonomous agents learning to act in unknown environments have been attracting research interest due to their wider implications for AI, as well as for their applications in complex domains, including robotics, network optimisation, and resource allocation. Currently, one of the most successful approaches is reinforcement learning (RL). However, to learn how to act, agents are required to explore the environment, which in safety-critical scenarios means that they might take dangerous actions, possibly harming themselves or even putting human lives at risk. Consequently, reinforcement learning is still rarely used in real-world applications, where multiple safety-critical constraints need to be satisfied simultaneously.To alleviate this problem, RL algorithms are being combined with formal verification techniques to ensure safety in learning. Indeed, formal methods are nowadays routinely applied to the specification, design, and verification of complex systems, as they allow to obtain proof-like certification of their correct and safe behaviour, which is meant to be intelligible to system engineers and human users alike. These desirable features have motivated the adoption of formal methods for the verification of general AI systems, which has variously been called safe, verifiable, trustworthy AI 1. Still, the application of formal methods to AI systems raises significant new challenges, including the "black-box" nature of most machine learning algorithms used nowadays. Specific to the application of formal methods to RL, we identify two main shortcomings with current approaches, which will be tackled in this project:- Most of current verification methodologies do not scale well as the complexity of the application increases. This state explosion problem is particularly acute for RL scenarios, where agents might have to chose among a huge number of action/state transitions (e.g., autonomous cars).- Systems with multiple learning agents are comparatively less explored, and therefore less understood, than single-agent settings, partly because of the high-dimensionality of their state-space and their non-stationarity. Yet, multi-agent settings are key for applications, such as platooning for autonomous vehicles and robot swarms.To tackle both problems, we put forward an abstraction-based approach to verification, which is meant to reduce the state space, also by leveraging on symmetries of the system, while preserving all its safety-related features, thus leading to guaranteed and scalable safe behaviours. The research envisaged in this project is timely and it fits with the current portfolio of EPSRC-funded research, as it aligns with the theme of AI and robotics, in particular the key strategic investment in trust-worthy autonomous systems. The present proposal is aimed at developing a verifiably safe RL methodology, which is meant to have a positive societal impact on the trust of the general public towards deployed AI solutions, and to facilitate their adoption within society at large.

学习在未知环境中行动的自主代理一直吸引着研究兴趣，因为它们对人工智能的广泛影响，以及它们在复杂领域的应用，包括机器人技术，网络优化和资源分配。目前，最成功的方法之一是强化学习（RL）。然而，为了学习如何行动，智能体需要探索环境，这在安全关键场景中意味着它们可能会采取危险的行动，可能会伤害自己，甚至危及人类生命。因此，强化学习仍然很少用于现实世界的应用中，其中多个安全关键约束需要同时满足。为了缓解这个问题，强化学习算法正在与形式验证技术相结合，以确保学习的安全性。事实上，形式化方法现在通常应用于复杂系统的规范，设计和验证，因为它们允许获得其正确和安全行为的证明，这意味着系统工程师和人类用户都可以理解。这些令人满意的功能促使人们采用形式化方法来验证一般AI系统，这些系统被称为安全，可验证，可信赖的AI 1。尽管如此，形式化方法在人工智能系统中的应用提出了重大的新挑战，包括目前使用的大多数机器学习算法的“黑箱”性质。具体到形式化方法在强化学习中的应用，我们确定了当前方法的两个主要缺点，这将在本项目中得到解决：-大多数当前的验证方法不能随着应用程序复杂性的增加而扩展。这种状态爆炸问题对于RL场景特别严重，其中代理可能必须在大量的动作/状态转换中进行选择（例如，自动汽车）。多个学习代理的系统是比较少的探索，因此不太了解，比单代理设置，部分原因是他们的状态空间的高维性和非平稳性。然而，多智能体设置是关键的应用程序，如自动驾驶汽车和机器人swarms.To解决这两个问题，我们提出了一个基于抽象的方法来验证，这是为了减少状态空间，也利用对称性的系统，同时保留其所有的安全相关的功能，从而导致有保证的和可扩展的安全行为。该项目中设想的研究是及时的，它符合EPSRC资助的研究的当前组合，因为它符合人工智能和机器人技术的主题，特别是对值得信赖的自主系统的关键战略投资。目前的提案旨在开发一种可验证的安全RL方法，这意味着对公众对部署的AI解决方案的信任产生积极的社会影响，并促进其在整个社会中的采用。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Francesco Belardinelli其他文献

On the Stability of Learning in Network Games with Many Players

论多人网络游戏中学习的稳定性

DOI：
10.48550/arxiv.2403.15848
发表时间：
2024
期刊：
Adaptive Agents and Multi-Agent Systems
影响因子：
0
作者：
A. Hussain;D.G. Leonte;Francesco Belardinelli;G. Piliouras
通讯作者：
G. Piliouras

The Reasons that Agents Act: Intention and Instrumental Goals

代理人行动的原因：意图和工具性目标

DOI：
10.48550/arxiv.2402.07221
发表时间：
2024
期刊：
影响因子：
0
作者：
Francis Rhys Ward;Matt MacDermott;Francesco Belardinelli;Francesca Toni;Tom Everitt
通讯作者：
Tom Everitt

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

竞争网络中多智能体学习的稳定性：延迟混沌的发生

DOI：
10.48550/arxiv.2312.11943
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
A. Hussain;Francesco Belardinelli
通讯作者：
Francesco Belardinelli

Aggregating bipolar opinions through bipolar assumption-based argumentation

DOI：
10.1007/s10458-024-09684-3
发表时间：
2024-11-25
期刊：
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS
影响因子：
2.600
作者：
Charles Dickie;Stefan Lauren;Francesco Belardinelli;Antonio Rago;Francesca Toni
通讯作者：
Francesca Toni