FMitF: Track I: Safe Multi-Agent Reinforcement Learning with Shielding
FMITF:第一轨:带屏蔽的安全多智能体强化学习
基本信息
- 批准号:2319500
- 负责人:
- 金额:$ 75万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-10-01 至 2027-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project combines expertise from formal methods (FM) and reinforcement learning (RL) to develop a novel methodology for building safe multi-agent RL (MARL) systems. RL methods are good at solving complex tasks in real-world environments (e.g., robots learning to navigate unknown environments), but cannot provide "hard" guarantees on safety (e.g., robots guaranteed to never collide with each other). Formal methods provide rigorous safety guarantees, but are difficult to scale to real-world settings. This project seeks to combine the best of both worlds in order to devise methods that are capable of learning to solve complex tasks in real-world environments, while at the same time ensuring safety. The project's novelties are a combination of techniques such as shield synthesis from FM and directed exploration from RL into a novel safety-focused methodology, as well as its implementation into a tool suite and its evaluation on a set of benchmarks. The project's impacts are in transforming the way RL systems are developed and deployed so that they can be used in safety-critical settings. Broader impacts include broadening participation in research to diverse groups and involving undergraduate students.Key concepts of the project are safety shields and safety coaches. Safety shields are to be used in safety-critical situations, where safety is paramount (either during training or during execution). Shields prevent safety violations by intercepting (and modifying) potentially unsafe actions by the agents. Safety coaches are to be used when safety violations can be tolerated (e.g., in virtual training or simulated execution). Coaches train for safety by encouraging agents to make mistakes, and to learn from them. Shields are a known concept, but have not been studied in the most common MARL settings—decentralized execution or partial observability. Coaches are a novel concept introduced in this project, which will teach agents to be safe while solving the task. The project will develop (1) new formal methods and concepts, specifically decentralized shield synthesis and safety coaches, (2) new MARL techniques, specifically, safety-directed exploration, training for safety, and hardwiring safety information directly into agent policies, and (3) novel applications of model learning and abstraction refinement to the MARL setting.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
这个项目结合了形式方法(FM)和强化学习(RL)的专业知识,开发了一种构建安全的多代理RL(MAIL)系统的新方法。RL方法擅长于解决现实环境中的复杂任务(例如,机器人学习在未知环境中导航),但不能提供安全方面的“硬”保证(例如,机器人保证永远不会相互碰撞)。正式的方法提供了严格的安全保证,但很难扩展到现实世界的环境中。这个项目寻求结合两个世界的最好,以便设计出能够学习在现实世界环境中解决复杂任务的方法,同时确保安全。该项目的创新之处在于结合了多种技术,如从FM合成屏蔽,从RL定向探测到一种新的以安全为重点的方法,以及将其实施到工具套件中,并在一组基准上进行评估。该项目的影响在于改变了RL系统的开发和部署方式,以便它们可以在安全关键环境中使用。更广泛的影响包括将参与研究的范围扩大到不同的群体,并让本科生参与进来。该项目的关键概念是安全盾牌和安全教练。安全盾牌用于安全至关重要的情况下,安全至上(无论是在训练期间还是在执行过程中)。盾牌通过拦截(和修改)特工可能不安全的行为来防止违反安全规定。当可以容忍违反安全规定时(例如,在虚拟培训或模拟执行中),应使用安全教练。教练通过鼓励特工犯错并从中吸取教训来进行安全培训。屏蔽是一个已知的概念,但还没有在最常见的Marl环境中进行研究-分散执行或部分可观察性。教练是这个项目中引入的一个新概念,它将教会代理在解决任务时保持安全。该项目将开发(1)新的正式方法和概念,特别是分散的屏蔽合成和安全教练,(2)新的Marl技术,特别是安全导向的勘探、安全培训,并将安全信息直接连接到代理政策中,以及(3)模型学习和抽象改进在Marl设置中的新应用。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Stavros Tripakis其他文献
Predictive runtime enforcement
- DOI:
10.1007/s10703-017-0271-1 - 发表时间:
2017-02-23 - 期刊:
- 影响因子:0.800
- 作者:
Srinivas Pinisetty;Viorel Preoteasa;Stavros Tripakis;Thierry Jéron;Yliès Falcone;Hervé Marchand - 通讯作者:
Hervé Marchand
Automatic generation of path conditions for concurrent timed systems
- DOI:
10.1016/j.tcs.2008.03.012 - 发表时间:
2008-09-28 - 期刊:
- 影响因子:
- 作者:
Saddek Bensalem;Doron Peled;Hongyang Qu;Stavros Tripakis - 通讯作者:
Stavros Tripakis
The Science of Software and System Design
软件与系统设计科学
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Stavros Tripakis - 通讯作者:
Stavros Tripakis
Implémentabilité des automates temporisés
临时自动化植入性
- DOI:
10.3166/jesa.39.395-406 - 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
K. Altisen;Nicolas Markey;P. Reynier;Stavros Tripakis - 通讯作者:
Stavros Tripakis
by Enumeration Modulo Isomorphisms
通过枚举模同构
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Derek Egolf;Stavros Tripakis - 通讯作者:
Stavros Tripakis
Stavros Tripakis的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Stavros Tripakis', 18)}}的其他基金
SaTC: CORE: Medium: Collaborative: Bridging the Gap between Protocol Design and Implementation through Automated Mapping
SaTC:核心:媒介:协作:通过自动映射弥合协议设计与实现之间的差距
- 批准号:
1801546 - 财政年份:2018
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
CPS: Breakthrough: Compositional System Modeling with Interfaces (COSMOI)
CPS:突破:带接口的组合系统建模 (COSMOI)
- 批准号:
1329759 - 财政年份:2013
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
相似海外基金
NSF Convergence Accelerator Track K: Towards Resilient, Equitable, Safe and Sustainable Water for Islands (RESSI-H2O)
NSF 融合加速器轨道 K:为岛屿提供有弹性、公平、安全和可持续的水 (RESSI-H2O)
- 批准号:
2344418 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
SCC-IRG Track 1: Smart and Safe Prescribed Burning for Rangeland and Wildland Urban Interface Communities
SCC-IRG 第 1 轨道:牧场和荒地城市界面社区的智能、安全规定燃烧
- 批准号:
2306603 - 财政年份:2023
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
Collaborative Research: FMitF: Track I: Designing Safe and Robust Human-machine Interactions with Fuzzy Mental Models
合作研究:FMitF:第一轨:利用模糊心理模型设计安全、鲁棒的人机交互
- 批准号:
2319318 - 财政年份:2023
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
Collaborative Research: FMitF: Track I: Composable Verification of Crash-Safe Distributed Systems with Grove
合作研究:FMitF:第一轨:使用 Grove 对崩溃安全分布式系统进行可组合验证
- 批准号:
2318722 - 财政年份:2023
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
Convergence Accelerator Track J Phase 2: Rapid Detection Technologies and Decision-Support Systems for Safe, Equitable Food Systems
融合加速器轨道 J 第 2 阶段:安全、公平食品系统的快速检测技术和决策支持系统
- 批准号:
2344877 - 财政年份:2023
- 资助金额:
$ 75万 - 项目类别:
Cooperative Agreement
Collaborative Research: FMitF: Track I: Designing Safe and Robust Human-machine Interactions with Fuzzy Mental Models
合作研究:FMitF:第一轨:利用模糊心理模型设计安全、鲁棒的人机交互
- 批准号:
2319317 - 财政年份:2023
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
FMitF: Track I: Safe, Efficient Persistent Memory Systems
FMITF:第一轨:安全、高效的持久内存系统
- 批准号:
2220410 - 财政年份:2022
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
Collaborative Research: FMitF: Track I: Composable Verification of Crash-Safe Distributed Systems with Grove
合作研究:FMitF:第一轨:使用 Grove 对崩溃安全分布式系统进行可组合验证
- 批准号:
2123864 - 财政年份:2021
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
SCC-CIVIC-PG Track B: Remote Monitoring of Small Rural Water Systems to Ensure Safe Drinking Water through Disasters and Natural Recovery
SCC-CIVIC-PG 轨道 B:远程监控小型农村供水系统,确保灾难和自然恢复过程中的安全饮用水
- 批准号:
2043847 - 财政年份:2021
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
SCC-CVIC-PG Track A: Enabling Safe, Community-wide Bike-to-Work Strategies via Participatory Sensing
SCC-CVIC-PG 轨道 A:通过参与式传感实现安全、社区范围内的自行车上班策略
- 批准号:
2044034 - 财政年份:2021
- 资助金额:
$ 75万 - 项目类别:
Standard Grant