权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reinforcement Learning for Finite Horizons (ReLeaF)

有限视野强化学习 (ReLeaF)

基本信息

批准号：
EP/X021513/1
负责人：
Sven Schewe
金额：
$ 26万
依托单位：
University of Liverpool
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX021513%2F1
关键词：
Reinforcement Learning Finite Horizons ReLeaF

项目摘要

Reinforcement learning (RL) is a technique for learning how to take actions in an initially unknown environment in order to optimise an expected outcome, which is modelled through the notion of maximising an accumulative reward. Learning algorithms with goals written as temporal specifications have three key ingredients: the translation from the specification to appropriate finite automata; the translation of these finite automata to reward structures, such that a strategy that provides optimal rewards is guaranteed to provide optimal control; and a wrapper into a discounting scheme that, for appropriate parameters, will ensure that a learner converge to an optimal strategy.We will consider the RL problems for a popular specification language used in automation and motion planning, the finite horizon linear time temporal logic LTLf. In particular, we will study model-free RL algorithms, which are more suitable to real-world applications where the behaviour of the environment is hard to predict, than its model-based counterpart. We will propose learning algorithms that provide translations from finite horizon LTL to reward structures with formal guarantees of satisfying the given goals for environments modelled as Markov Decision Processes (MDPs). We will extend our techniques to infinite-state MDPs, including variations where formal guarantees can be provided -- like countable, finitely branching MDPs -- and study conditions for our techniques to provide guarantees in more general classes, such as smoothness guarantees for compact MDPs. We will complement these lines of research by looking at goals with constraints. This is effectively considering prioritised goals, where meeting safety constraints takes precedence, while other properties -- such as efficiency -- are considered as tie-breakers among strategies that provide the same safety guarantees.

强化学习（RL）是一种学习如何在最初未知的环境中采取行动以优化预期结果的技术，它通过最大化累积奖励的概念来建模。学习算法的目标写为时间规范有三个关键组成部分：翻译从规范到适当的有限自动机;翻译这些有限自动机的奖励结构，这样的策略，提供最佳的回报是保证提供最佳的控制;和包装器到折扣方案中，对于适当的参数，将确保学习器收敛到最优策略。我们将考虑用于自动化和运动规划的流行规范语言有限时域线性时间时态逻辑LTLf的RL问题。特别是，我们将研究无模型的RL算法，它更适合于现实世界中的应用程序，其中环境的行为是难以预测的，而不是基于模型的对应。我们将提出学习算法，提供从有限时域LTL到奖励结构的转换，并正式保证满足建模为马尔可夫决策过程（MDP）的环境的给定目标。我们将我们的技术扩展到无限状态的MDP，包括可以提供正式保证的变化-如可数，分支MDP-和研究条件，我们的技术提供更一般的类，如平滑保证紧凑的MDP的保证。我们将通过研究具有约束条件的目标来补充这些研究领域。这是有效地考虑优先目标，其中满足安全约束优先，而其他属性-如效率-被认为是提供相同安全保证的战略之间的决胜局。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023, Paris, France, April 22-27, 2023, Proceedings, Part I

系统构建和分析的工具和算法 - 第 29 届国际会议，TACAS 2023，作为欧洲软件理论与实践联合会议的一部分举行，ETAPS 2023，法国巴黎，2023 年 4 月 22-27 日，会议记录，部分

DOI：
10.1007/978-3-031-30823-9_28
发表时间：
2023
期刊：
影响因子：
0
作者：
Park S
通讯作者：
Park S

Automated Technology for Verification and Analysis - 21st International Symposium, ATVA 2023, Singapore, October 24-27, 2023, Proceedings, Part I

验证和分析自动化技术 - 第 21 届国际研讨会，ATVA 2023，新加坡，2023 年 10 月 24-27 日，会议记录，第一部分

DOI：
10.1007/978-3-031-45329-8_3
发表时间：
2023
期刊：
影响因子：
0
作者：
Li Y
通讯作者：
Li Y

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sven Schewe其他文献

Editorial: special issue on synthesis

DOI：
10.1007/s00236-014-0198-6
发表时间：
2014-04-19
期刊：
ACTA INFORMATICA
影响因子：
0.500
作者：
Doron Peled;Sven Schewe
通讯作者：
Sven Schewe

Digital features of chemical elements extracted from local geometries in crystal structures

从晶体结构中的局部几何形状提取的化学元素的数字特征

DOI：
10.1039/d4dd00346b
发表时间：
2024-12-17
期刊：
Digital Discovery
影响因子：
5.600
作者：
Andrij Vasylenko;Dmytro Antypov;Sven Schewe;Luke M. Daniels;John B. Claridge;Matthew S. Dyer;Matthew J. Rosseinsky
通讯作者：
Matthew J. Rosseinsky