权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Continuous Time Reinforcement Learning using Rough Paths

使用粗糙路径的连续时间强化学习

基本信息

批准号：
2153915
负责人：
Samy Tindel
金额：
$ 65.56万
依托单位：
Purdue University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-08-01 至 2025-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2153915&HistoricalAwards=false
关键词：
Continuous Time Reinforcement Learning using

项目摘要

Reinforcement learning (RL) methods have been embraced, in both academic and industrial settings, for solving a range of science and engineering problems involving dynamic system optimization. These problems run the gamut and include optimal resource allocation in, for example, ride-sharing, healthcare management and energy systems, pricing and trading risky assets in finance, autonomous vehicles and robots. RL is also increasingly being used in the physical sciences, for instance for discovering new materials and/or exploring the properties of known materials. Despite the growing importance and breadth of applications, RL methods are often found to perform poorly in actuality. RL methods have been primarily developed for discrete-time and so-called Markovian settings, while most real-world problems are better modeled in continuous time, with non-Markovian dynamics. The broad adoption of RL methods across science and engineering necessitates the investigation of how to develop RL methods for continuous-time and non-Markovian settings. This project aims at laying the foundations for addressing these questions. In addition, the PIs have developed specific aims in terms of dissemination of discoveries, survey for graduate students, national and international networking, mentoring of junior researchers as well as graduate and undergraduate students, participation and organization of events, and interdisciplinary research.The successful completion of this project will fill make significant contributions towards the theoretical analysis of continuous-time RL. The project will offer a global framework valid for general random environments. In particular it goes beyond the somewhat restrictive Markov setting, and allows for pathwise controls. At its core, this research project aims at the development of analytical results that can be used to provide theoretical guarantees for continuous-time RL problems across a range of application domains. The successful completion of the project will entail a number of new results characterizing the solution of pathwise optimal control in rough environments, the analysis of computational methods for obtaining optimal policies, as well as the analysis of numerical schemes for approximating policies and value functions using rough path signatures. The proposed efforts will have sufficient novelty to open new research areas. They will also further promote the applicability of the theoretical techniques alluded to above.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习(RL)方法已被学术界和工业界所接受，用于解决涉及动态系统优化的一系列科学和工程问题。这些问题涉及方方面面，包括拼车、医疗管理和能源系统等领域的资源优化配置，以及金融、自动驾驶汽车和机器人领域的风险资产定价和交易。RL也越来越多地被用于物理科学，例如，用于发现新材料和/或探索已知材料的性质。尽管应用的重要性和广度越来越大，但现实中的RL方法往往表现不佳。RL方法主要是为离散时间和所谓的马尔可夫环境开发的，而大多数现实世界的问题在连续时间内被更好地建模，具有非马尔可夫动力学。RL方法在科学和工程中的广泛采用需要研究如何为连续时间和非马尔科夫环境开发RL方法。该项目旨在为解决这些问题奠定基础。此外，PIS在传播发现、对研究生的调查、国内和国际网络、指导初级研究人员以及研究生和本科生、参与和组织活动以及跨学科研究方面制定了具体目标。这一项目的成功完成将填补对连续时间RL理论分析的重大贡献。该项目将提供一个对一般随机环境有效的全球框架。特别是，它超越了有点受限的马尔可夫设置，并允许路径控制。这个研究项目的核心是开发分析结果，这些结果可以用来为一系列应用领域的连续时间RL问题提供理论保证。该项目的成功完成将带来一些新的结果，包括粗糙环境下路径最优控制的解决方案，获得最优策略的计算方法的分析，以及使用粗略路径签名逼近策略和值函数的数值方案的分析。拟议的努力将具有足够的新颖性，以开辟新的研究领域。他们还将进一步促进上述理论技术的适用性。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。