权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CIF: Small: How Much of Reinforcement Learning is Gradient Descent?

CIF：小：强化学习中有多少是梯度下降？

基本信息

批准号：
2245059
负责人：
Alexander Olshevsky
金额：
$ 30.12万
依托单位：
Trustees of Boston University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-06-01 至 2026-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2245059&HistoricalAwards=false
关键词：
CIF Small How Much Reinforcement

项目摘要

In the past decade, reinforcement learning has achieved remarkable success in a wide range of applications, from games such as chess and go to advanced applications such as chip design and aerial navigation. There is now ample evidence that reinforcement learning represents one of the most promising research directions to deliver the next generation of autonomous systems. However, many popular reinforcement-learning methods often fail to converge, making the use of reinforcement learning in practice more an art than a science. This project will explore a novel approach to analyzing and designing convergent reinforcement-learning methods based on a recently discovered connection to gradient descent. This connection will not only improve the analysis of existing algorithms but also lead to the development of new methods.This project builds on a novel concept, gradient splitting, which allows classical reinforcement-learning methods to be viewed as modifications of stochastic-gradient-descent updates, which inherit many key properties of gradient descent. We will use this connection to develop variations of temporal difference learning and Q-learning which, when given a dataset sampled from a Markov decision process, will converge geometrically to the statistically optimal estimate of the true value function. Coupled with neural-network approximation, our methods will approximate the true value function with an additional error that is inversely proportional to a power of the width of the underlying neural network. These results will then be used to develop a provably convergent neural actor-critic method. The new methods we will develop will not only provide rigorous bounds on the performance of neural networks in reinforcement learning but also will result in significantly faster training times compared to existing methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在过去的十年中，强化学习在广泛的应用中取得了显着的成功，从国际象棋等游戏到芯片设计和航空导航等高级应用。现在有充分的证据表明，强化学习是提供下一代自主系统的最有前途的研究方向之一。然而，许多流行的强化学习方法往往无法收敛，这使得强化学习在实践中的使用更像是一门艺术，而不是一门科学。该项目将探索一种新的方法来分析和设计基于最近发现的梯度下降连接的收敛迭代学习方法。这种联系不仅会改善现有算法的分析，而且会导致新方法的发展。这个项目建立在一个新的概念，梯度分裂，它允许经典的迭代学习方法被视为随机梯度下降更新的修改，它继承了梯度下降的许多关键属性。我们将使用这种连接来开发时间差异学习和Q学习的变体，当给定从马尔可夫决策过程中采样的数据集时，它们将几何收敛到真值函数的统计最优估计。再加上神经网络近似，我们的方法将近似真值函数，其误差与底层神经网络宽度的幂成反比。然后，这些结果将被用来开发一个可证明收敛的神经演员-评论家方法。我们将开发的新方法不仅将为神经网络在强化学习中的性能提供严格的界限，而且与现有方法相比，还将显著加快训练时间。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。