An Abstraction-based Technique for Safe Reinforcement Learning
一种基于抽象的安全强化学习技术
基本信息
- 批准号:EP/X015823/1
- 负责人:
- 金额:$ 38.49万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Autonomous agents learning to act in unknown environments have been attracting research interest due to their wider implications for AI, as well as for their applications in complex domains, including robotics, network optimisation, and resource allocation. Currently, one of the most successful approaches is reinforcement learning (RL). However, to learn how to act, agents are required to explore the environment, which in safety-critical scenarios means that they might take dangerous actions, possibly harming themselves or even putting human lives at risk. Consequently, reinforcement learning is still rarely used in real-world applications, where multiple safety-critical constraints need to be satisfied simultaneously.To alleviate this problem, RL algorithms are being combined with formal verification techniques to ensure safety in learning. Indeed, formal methods are nowadays routinely applied to the specification, design, and verification of complex systems, as they allow to obtain proof-like certification of their correct and safe behaviour, which is meant to be intelligible to system engineers and human users alike. These desirable features have motivated the adoption of formal methods for the verification of general AI systems, which has variously been called safe, verifiable, trustworthy AI 1. Still, the application of formal methods to AI systems raises significant new challenges, including the "black-box" nature of most machine learning algorithms used nowadays. Specific to the application of formal methods to RL, we identify two main shortcomings with current approaches, which will be tackled in this project:- Most of current verification methodologies do not scale well as the complexity of the application increases. This state explosion problem is particularly acute for RL scenarios, where agents might have to chose among a huge number of action/state transitions (e.g., autonomous cars).- Systems with multiple learning agents are comparatively less explored, and therefore less understood, than single-agent settings, partly because of the high-dimensionality of their state-space and their non-stationarity. Yet, multi-agent settings are key for applications, such as platooning for autonomous vehicles and robot swarms.To tackle both problems, we put forward an abstraction-based approach to verification, which is meant to reduce the state space, also by leveraging on symmetries of the system, while preserving all its safety-related features, thus leading to guaranteed and scalable safe behaviours. The research envisaged in this project is timely and it fits with the current portfolio of EPSRC-funded research, as it aligns with the theme of AI and robotics, in particular the key strategic investment in trust-worthy autonomous systems. The present proposal is aimed at developing a verifiably safe RL methodology, which is meant to have a positive societal impact on the trust of the general public towards deployed AI solutions, and to facilitate their adoption within society at large.
学习在未知环境中行动的自主代理一直吸引着研究兴趣,因为它们对人工智能的广泛影响,以及它们在复杂领域的应用,包括机器人技术,网络优化和资源分配。目前,最成功的方法之一是强化学习(RL)。然而,为了学习如何行动,智能体需要探索环境,这在安全关键场景中意味着它们可能会采取危险的行动,可能会伤害自己,甚至危及人类生命。因此,强化学习仍然很少用于现实世界的应用中,其中多个安全关键约束需要同时满足。为了缓解这个问题,强化学习算法正在与形式验证技术相结合,以确保学习的安全性。事实上,形式化方法现在通常应用于复杂系统的规范,设计和验证,因为它们允许获得其正确和安全行为的证明,这意味着系统工程师和人类用户都可以理解。这些令人满意的功能促使人们采用形式化方法来验证一般AI系统,这些系统被称为安全,可验证,可信赖的AI 1。尽管如此,形式化方法在人工智能系统中的应用提出了重大的新挑战,包括目前使用的大多数机器学习算法的“黑箱”性质。具体到形式化方法在强化学习中的应用,我们确定了当前方法的两个主要缺点,这将在本项目中得到解决:-大多数当前的验证方法不能随着应用程序复杂性的增加而扩展。这种状态爆炸问题对于RL场景特别严重,其中代理可能必须在大量的动作/状态转换中进行选择(例如,自动汽车)。多个学习代理的系统是比较少的探索,因此不太了解,比单代理设置,部分原因是他们的状态空间的高维性和非平稳性。然而,多智能体设置是关键的应用程序,如自动驾驶汽车和机器人swarms.To解决这两个问题,我们提出了一个基于抽象的方法来验证,这是为了减少状态空间,也利用对称性的系统,同时保留其所有的安全相关的功能,从而导致有保证的和可扩展的安全行为。该项目中设想的研究是及时的,它符合EPSRC资助的研究的当前组合,因为它符合人工智能和机器人技术的主题,特别是对值得信赖的自主系统的关键战略投资。目前的提案旨在开发一种可验证的安全RL方法,这意味着对公众对部署的AI解决方案的信任产生积极的社会影响,并促进其在整个社会中的采用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Francesco Belardinelli其他文献
On the Stability of Learning in Network Games with Many Players
论多人网络游戏中学习的稳定性
- DOI:
10.48550/arxiv.2403.15848 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
A. Hussain;D.G. Leonte;Francesco Belardinelli;G. Piliouras - 通讯作者:
G. Piliouras
The Reasons that Agents Act: Intention and Instrumental Goals
代理人行动的原因:意图和工具性目标
- DOI:
10.48550/arxiv.2402.07221 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Francis Rhys Ward;Matt MacDermott;Francesco Belardinelli;Francesca Toni;Tom Everitt - 通讯作者:
Tom Everitt
Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos
竞争网络中多智能体学习的稳定性:延迟混沌的发生
- DOI:
10.48550/arxiv.2312.11943 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
A. Hussain;Francesco Belardinelli - 通讯作者:
Francesco Belardinelli
Aggregating bipolar opinions through bipolar assumption-based argumentation
- DOI:
10.1007/s10458-024-09684-3 - 发表时间:
2024-11-25 - 期刊:
- 影响因子:2.600
- 作者:
Charles Dickie;Stefan Lauren;Francesco Belardinelli;Antonio Rago;Francesca Toni - 通讯作者:
Francesca Toni
Francesco Belardinelli的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Francesco Belardinelli', 18)}}的其他基金
Strategy Logics for the Verification of Security Protocols
安全协议验证的策略逻辑
- 批准号:
EP/V009214/1 - 财政年份:2021
- 资助金额:
$ 38.49万 - 项目类别:
Research Grant
The Third International Workshop on Formal Methods in Artificial Intelligence
第三届人工智能形式化方法国际研讨会
- 批准号:
EP/V008013/1 - 财政年份:2021
- 资助金额:
$ 38.49万 - 项目类别:
Research Grant
相似国自然基金
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Exploring the Intrinsic Mechanisms of CEO Turnover and Market Reaction: An Explanation Based on Information Asymmetry
- 批准号:W2433169
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
Incentive and governance schenism study of corporate green washing behavior in China: Based on an integiated view of econfiguration of environmental authority and decoupling logic
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
含Re、Ru先进镍基单晶高温合金中TCP相成核—生长机理的原位动态研究
- 批准号:52301178
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
NbZrTi基多主元合金中化学不均匀性对辐照行为的影响研究
- 批准号:12305290
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
眼表菌群影响糖尿病患者干眼发生的人群流行病学研究
- 批准号:82371110
- 批准年份:2023
- 资助金额:49.00 万元
- 项目类别:面上项目
镍基UNS N10003合金辐照位错环演化机制及其对力学性能的影响研究
- 批准号:12375280
- 批准年份:2023
- 资助金额:53.00 万元
- 项目类别:面上项目
CuAgSe基热电材料的结构特性与构效关系研究
- 批准号:22375214
- 批准年份:2023
- 资助金额:50.00 万元
- 项目类别:面上项目
A study on prototype flexible multifunctional graphene foam-based sensing grid (柔性多功能石墨烯泡沫传感网格原型研究)
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
基于大数据定量研究城市化对中国季节性流感传播的影响及其机理
- 批准号:82003509
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
A novel damage characterization technique based on adaptive deconvolution extraction algorithm of multivariate AE signals for accurate diagnosis of osteoarthritic knees
基于多变量 AE 信号自适应反卷积提取算法的新型损伤表征技术,用于准确诊断膝关节骨关节炎
- 批准号:
24K07389 - 财政年份:2024
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Creation of low-noise quantum 3D imaging technique based on transport of intensity equation
基于强度传输方程的低噪声量子3D成像技术的创建
- 批准号:
23K17749 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Elucidating the source of marine recalcitrant organic matter by omics-based technique
通过基于组学的技术阐明海洋顽固性有机物的来源
- 批准号:
23KJ0361 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Development of a new asset-management approach using a fast simulation technique based upon probability measure transformation
使用基于概率测度转换的快速模拟技术开发新的资产管理方法
- 批准号:
23K11000 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Gate insulator deposition process for GaN-based MOS devices using mist-CVD technique
使用雾气 CVD 技术的 GaN 基 MOS 器件的栅极绝缘体沉积工艺
- 批准号:
23K03973 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Research on an innovative quantitative analysis system for radioactive waste based on a complete gamma-ray visualization technique
基于完整伽马射线可视化技术的放射性废物创新定量分析系统研究
- 批准号:
23H01898 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Development of the new breast biopsy technique using pseudo tomosynthesis image aimed for the accuracy improvement based on the artificial intelligence
开发利用伪断层合成图像的新型乳腺活检技术,旨在基于人工智能提高准确性
- 批准号:
23K17230 - 财政年份:2023
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Developing a Novel Phosphorus-Based Bioconjugation Technique for Enzyme Immobilization
开发一种新型的基于磷的酶固定化生物共轭技术
- 批准号:
575729-2022 - 财政年份:2022
- 资助金额:
$ 38.49万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
Study on the diversity of the acceptance of different cultures, based on the reconstruction of the pottery beating paddle and the beating technique
基于打陶桨及打浆工艺的改造研究不同文化接受的多样性
- 批准号:
22K00994 - 财政年份:2022
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of an artificial intelligence-based drug discovery technique for mid-sized molecules targeting new protein–protein interaction
开发基于人工智能的药物发现技术,用于靶向新蛋白质的中型分子
- 批准号:
22K15258 - 财政年份:2022
- 资助金额:
$ 38.49万 - 项目类别:
Grant-in-Aid for Early-Career Scientists