权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SHF: Small: Omega-Regular Objectives for Model-Free Reinforcement Learning

SHF：小型：无模型强化学习的 Omega-Regular 目标

基本信息

批准号：
2009022
负责人：
Ashutosh Trivedi
金额：
$ 50万
依托单位：
University of Colorado at Boulder
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-06-15 至 2024-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2009022&HistoricalAwards=false
关键词：
SHF Small Omega Regular Objectives

项目摘要

In Reinforcement Learning (RL) agents rely on rewards that promote the achievement of given objectives. Widespread use of RL-enabled systems, such as swarm robots, autonomous vehicles, Internet-of-Things, and social networks, will dramatically improve the quality of modern life. However, their applications in safety-critical settings imply that methods to ensure their correctness are of paramount importance. This project develops a rigorous approach to the design and verification of RL-enabled systems that addresses issues of safety, efficiency, and scalability. Logic provides a foundation for the rigorous specification of learning objectives. Model-free RL, which is the type of learning supported by neural networks, promises scalability. Hence this project is about translating logic-based requirements into the scalar reward form that is needed in model-free RL. Bridging the gap between logic specifications and model-free RL requires a translation that is faithful (greater reward means higher probability of satisfying the objective) and effective (the reward should help RL algorithms to learn quickly and reliably). This project develops foundations for faithful and effective translations of omega-regular specifications and explores their applications to synthesis of RL-enabled systems. The transition from theory to practice will be measured by the success of an open-source tool for the synthesis of interpreters that translate environment observations into rewards for state-of-the-art, off-the-shelf RL algorithms. Both the formal-methods and the RL communities will benefit from this project. The PIs will extend their record of technology transfer with the release of software and educational materials.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在强化学习（RL）中，智能体依赖于奖励来促进给定目标的实现。广泛使用支持RL的系统，如群机器人、自动驾驶汽车、物联网和社交网络，将极大地提高现代生活的质量。然而，它们在安全关键设置中的应用意味着确保其正确性的方法是至关重要的。该项目开发了一种严格的方法来设计和验证支持RL的系统，以解决安全性，效率和可扩展性问题。逻辑为严格规范学习目标提供了基础。无模型RL是由神经网络支持的学习类型，具有可扩展性。因此，这个项目是关于将基于逻辑的需求转换为无模型RL所需的标量奖励形式。弥合逻辑规范和无模型强化学习之间的差距需要一个忠实（更大的奖励意味着更高的满足目标的概率）和有效（奖励应该帮助强化学习算法快速可靠地学习）的翻译。该项目为忠实和有效地翻译omega-正则规范奠定了基础，并探索了它们在RL支持系统合成中的应用。从理论到实践的转变将通过一个开源工具的成功来衡量，该工具用于合成解释器，将环境观察转化为对最先进的现成RL算法的奖励。形式方法和RL社区都将从这个项目中受益。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Alternating Good-for-MDPs Automata

交替 MDP 良好自动机

DOI：
发表时间：
2022
期刊：
International Symposium on Automated Technology for Verification and Analysis (ATVA 2022
影响因子：
0
作者：
Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
通讯作者：
Wojtczak, Dominik

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

DOI：
10.1007/978-3-030-59152-6_6
发表时间：
2020
期刊：
影响因子：
0
作者：
E. M. Hahn;Mateo Perez;S. Schewe;F. Somenzi;Ashutosh Trivedi;D. Wojtczak
通讯作者：
E. M. Hahn;Mateo Perez;S. Schewe;F. Somenzi;Ashutosh Trivedi;D. Wojtczak

Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning

DOI：
10.5555/3535850.3535933
发表时间：
2022
期刊：
影响因子：
0
作者：
M. Kazemi;Mateo Perez;F. Somenzi;Sadegh Soudjani;Ashutosh Trivedi;Alvaro Velasquez
通讯作者：
M. Kazemi;Mateo Perez;F. Somenzi;Sadegh Soudjani;Ashutosh Trivedi;Alvaro Velasquez

Model-Free Reinforcement Learning for Branching Markov Decision Processes

用于分支马尔可夫决策过程的无模型强化学习

DOI：
10.1007/978-3-030-81688-9_30
发表时间：
2021
期刊：
Computer Aided Verification. CAV 2021.
影响因子：
0
作者：
Hahn, E.M.;Perez, M.;Schewe, S.;Somenzi, F.;Trivedi, A.;Wojtczak, D.
通讯作者：
Wojtczak, D.

Model-Free Reinforcement Learning for Stochastic Parity Games

DOI：
10.4230/lipics.concur.2020.21
发表时间：
2020
期刊：
影响因子：
0
作者：
E. M. Hahn;Mateo Perez;S. Schewe;F. Somenzi;Ashutosh Trivedi;D. Wojtczak
通讯作者：
E. M. Hahn;Mateo Perez;S. Schewe;F. Somenzi;Ashutosh Trivedi;D. Wojtczak

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ashutosh Trivedi其他文献

Delhi

德里

DOI：
10.1177/0019556119790333
发表时间：
1979
期刊：
Geotechnical Characteristics of Soils and Rocks of India
影响因子：
0
作者：
Ashutosh Trivedi;Sadanand Ojha
通讯作者：
Sadanand Ojha

Weighted timed games : Positive results with negative costs

加权定时游戏：积极的结果与消极的成本

DOI：
发表时间：
2014
期刊：
影响因子：
0
作者：
Benjamin Monmege;Thomas Brihaye;G. Geeraerts;Krishna Shankara Narayanan;L. Manasa;Ashutosh Trivedi
通讯作者：
Ashutosh Trivedi

Formal verification of hyperproperties for control systems

控制系统超特性的形式化验证

DOI：
10.1145/3457335.3461715
发表时间：
2021
期刊：
Proceedings of the Workshop on Computation-Aware Algorithmic Design for Cyber-Physical Systems
影响因子：
0
作者：
Mahathi Anand;Vishnu Murali;Ashutosh Trivedi;Majid Zamani
通讯作者：
Majid Zamani