权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Lyapunov Methods for Reinforcement Learning

强化学习的李亚普诺夫方法

基本信息

批准号：
0070102
负责人：
Andrew Barto
金额：
$ 11.94万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2000
资助国家：
美国
起止时间：
2000-07-01 至 2002-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0070102&HistoricalAwards=false
关键词：
Lyapunov Methods Reinforcement Learning

项目摘要

0070102BartoReinforcement learning is a summary term for a collection of methods for approximating solutions to stochastic optimal control problems. RL methods leave been successfully applied to a large array of such problems in a diversity of domains, including finance, logistics, telecommunications, and robot control. Although similar problems have been studied intensively for many years in control engineering and operations research, the methods developed by RL researchers have added new elements to the classical solution methods. RL methods offer novel ways to approximate solutions to problems that are too large or ill-defined for the classical solution methods to be feasible.A significant part of RL research is directed at increasing on-line performance and speed of convergence by providing RL systems with domain knowledge. This project is concerned with knowledge related to the design of stabilizing controllers for complex dynamical systems. It will try to develop a general theory for incorporating this knowledge into RL systems. The basic idea is to mathematically define policy subspaces that have certain known stability and safety properties and to focus exploration on control laws that lie within these policy subspaces. The means by which this is achieved are based on the theory of Lyapunov stability and the associated methods of Lyapunov control design.***

巴托强化学习是逼近随机最优控制问题解的一组方法的总称。RL方法已经成功地应用于金融、物流、电信和机器人控制等多个领域的大量此类问题。虽然类似的问题在控制工程和运筹学中已经被深入研究了很多年，但RL研究人员开发的方法为经典的求解方法增添了新的元素。RL方法提供了一种新的方法来逼近那些太大或定义不明确的问题的解，而经典的求解方法是不可行的。RL研究的一个重要部分是通过向RL系统提供领域知识来提高在线性能和收敛速度。这个项目是关于复杂动力系统的镇定控制器设计的相关知识。它将试图开发一种将这些知识纳入RL系统的一般理论。其基本思想是在数学上定义具有某些已知稳定性和安全性的策略子空间，并将探索的重点放在这些策略子空间中的控制律上。实现这一点的方法是基于李亚普诺夫稳定性理论和相关的李亚普诺夫控制设计方法。