权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

FMitF: Track I: Safe Multi-Agent Reinforcement Learning with Shielding

FMITF：第一轨：带屏蔽的安全多智能体强化学习

基本信息

批准号：
2319500
负责人：
Stavros Tripakis
金额：
$ 75万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2027-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2319500&HistoricalAwards=false
关键词：
FMitF Track Safe Multi Agent

项目摘要

This project combines expertise from formal methods (FM) and reinforcement learning (RL) to develop a novel methodology for building safe multi-agent RL (MARL) systems. RL methods are good at solving complex tasks in real-world environments (e.g., robots learning to navigate unknown environments), but cannot provide "hard" guarantees on safety (e.g., robots guaranteed to never collide with each other). Formal methods provide rigorous safety guarantees, but are difficult to scale to real-world settings. This project seeks to combine the best of both worlds in order to devise methods that are capable of learning to solve complex tasks in real-world environments, while at the same time ensuring safety. The project's novelties are a combination of techniques such as shield synthesis from FM and directed exploration from RL into a novel safety-focused methodology, as well as its implementation into a tool suite and its evaluation on a set of benchmarks. The project's impacts are in transforming the way RL systems are developed and deployed so that they can be used in safety-critical settings. Broader impacts include broadening participation in research to diverse groups and involving undergraduate students.Key concepts of the project are safety shields and safety coaches. Safety shields are to be used in safety-critical situations, where safety is paramount (either during training or during execution). Shields prevent safety violations by intercepting (and modifying) potentially unsafe actions by the agents. Safety coaches are to be used when safety violations can be tolerated (e.g., in virtual training or simulated execution). Coaches train for safety by encouraging agents to make mistakes, and to learn from them. Shields are a known concept, but have not been studied in the most common MARL settings—decentralized execution or partial observability. Coaches are a novel concept introduced in this project, which will teach agents to be safe while solving the task. The project will develop (1) new formal methods and concepts, specifically decentralized shield synthesis and safety coaches, (2) new MARL techniques, specifically, safety-directed exploration, training for safety, and hardwiring safety information directly into agent policies, and (3) novel applications of model learning and abstraction refinement to the MARL setting.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

这个项目结合了形式方法(FM)和强化学习(RL)的专业知识，开发了一种构建安全的多代理RL(MAIL)系统的新方法。RL方法擅长于解决现实环境中的复杂任务(例如，机器人学习在未知环境中导航)，但不能提供安全方面的“硬”保证(例如，机器人保证永远不会相互碰撞)。正式的方法提供了严格的安全保证，但很难扩展到现实世界的环境中。这个项目寻求结合两个世界的最好，以便设计出能够学习在现实世界环境中解决复杂任务的方法，同时确保安全。该项目的创新之处在于结合了多种技术，如从FM合成屏蔽，从RL定向探测到一种新的以安全为重点的方法，以及将其实施到工具套件中，并在一组基准上进行评估。该项目的影响在于改变了RL系统的开发和部署方式，以便它们可以在安全关键环境中使用。更广泛的影响包括将参与研究的范围扩大到不同的群体，并让本科生参与进来。该项目的关键概念是安全盾牌和安全教练。安全盾牌用于安全至关重要的情况下，安全至上(无论是在训练期间还是在执行过程中)。盾牌通过拦截(和修改)特工可能不安全的行为来防止违反安全规定。当可以容忍违反安全规定时(例如，在虚拟培训或模拟执行中)，应使用安全教练。教练通过鼓励特工犯错并从中吸取教训来进行安全培训。屏蔽是一个已知的概念，但还没有在最常见的Marl环境中进行研究-分散执行或部分可观察性。教练是这个项目中引入的一个新概念，它将教会代理在解决任务时保持安全。该项目将开发(1)新的正式方法和概念，特别是分散的屏蔽合成和安全教练，(2)新的Marl技术，特别是安全导向的勘探、安全培训，并将安全信息直接连接到代理政策中，以及(3)模型学习和抽象改进在Marl设置中的新应用。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Stavros Tripakis其他文献

Predictive runtime enforcement

DOI：
10.1007/s10703-017-0271-1
发表时间：
2017-02-23
期刊：
FORMAL METHODS IN SYSTEM DESIGN
影响因子：
0.800
作者：
Srinivas Pinisetty;Viorel Preoteasa;Stavros Tripakis;Thierry Jéron;Yliès Falcone;Hervé Marchand
通讯作者：
Hervé Marchand