权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CIF: Medium: MoDL:Toward a Mathematical Foundation of Deep Reinforcement Learning

合作研究：CIF：媒介：MoDL：迈向深度强化学习的数学基础

基本信息

批准号：
2212261
负责人：
Simon Du
金额：
$ 60万
依托单位：
University of Washington
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2212261&HistoricalAwards=false
关键词：
Collaborative Research CIF Medium MoDL

项目摘要

Deep Reinforcement Learning (DRL), which uses neural networks to solve sequential decision-making problems, has made breakthroughs in real-world applications, such as robotics, gaming, healthcare, and transportation systems. However, current theoretical work on reinforcement learning is restricted to problems with a small number of states; as these results do not cover neural networks, they cannot be used to satisfactorily explain the empirical successes of DRL. This project seeks to bridge this gap by building a mathematical foundation for DRL that leverages ideas from approximation theory, control theory, and optimization theory. This will allow the computational and statistical complexity of DRL to be systematically characterized, and will help with designing more efficient and reliable empirical methods. Education and outreach plans are integrated into this project. Specifically, the investigators will mentor graduate and undergraduate students (some through the STARS program for underrepresented groups at the University of washington), develop new courses and monographs, organize research workshops, and develop course materials for a high school data science and artificial intelligence curriculum. This project has three major components. The first thrust identifies which types of guarantees are achievable by policies for different reinforcement learning problem instances. Concretely, this requires investigating how increasingly structured problem instances enable stronger guarantees for policies; this will be done by using, and further developing, tools from non-convex optimization to describe policies that achieve stationary points, local maxima, and global maxima of the reward function. The second thrust takes the perspective of approximation theory and capacity control to investigate how the neural network complexity can be gradually increased to eventually find the most complex sub-family of neural networks that permit sample-efficient algorithms. The third thrust builds upon the knowledge gained in the first two thrusts, and is devoted to the design of computationally efficient algorithms; this will be done by leveraging tools from optimization theory and by making connections with control theory.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

深度强化学习（DRL）利用神经网络解决顺序决策问题，在机器人、游戏、医疗保健和交通系统等现实应用中取得了突破。然而，目前关于强化学习的理论工作仅限于具有少量状态的问题；由于这些结果不包括神经网络，因此它们不能令人满意地解释DRL的经验成功。该项目旨在通过建立一个利用近似理论、控制理论和优化理论的DRL数学基础来弥合这一差距。这将使DRL的计算和统计复杂性得以系统地表征，并将有助于设计更有效和可靠的经验方法。教育和外联计划已纳入该项目。具体而言，调查人员将指导研究生和本科生（其中一些通过华盛顿大学代表性不足群体的STARS计划），开发新课程和专著，组织研究研讨会，并为高中数据科学和人工智能课程开发课程材料。这个项目有三个主要组成部分。第一个要点确定针对不同的强化学习问题实例，哪些类型的保证可以通过策略实现。具体地说，这需要调查日益结构化的问题实例如何为策略提供更强的保证；这将通过使用和进一步开发来自非凸优化的工具来描述实现稳态点、局部最大值和奖励函数的全局最大值的策略。第二部分从近似理论和容量控制的角度来研究如何逐渐增加神经网络的复杂性，最终找到允许样本效率算法的最复杂的神经网络子家族。第三个推力建立在前两个推力中获得的知识基础上，并致力于设计计算效率高的算法；这将通过利用优化理论中的工具并与控制理论建立联系来实现。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。