Collaborative Research: CIF: Medium: MoDL:Toward a Mathematical Foundation of Deep Reinforcement Learning
合作研究:CIF:媒介:MoDL:迈向深度强化学习的数学基础
基本信息
- 批准号:2212263
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-10-01 至 2026-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Deep Reinforcement Learning (DRL), which uses neural networks to solve sequential decision-making problems, has made breakthroughs in real-world applications, such as robotics, gaming, healthcare, and transportation systems. However, current theoretical work on reinforcement learning is restricted to problems with a small number of states; as these results do not cover neural networks, they cannot be used to satisfactorily explain the empirical successes of DRL. This project seeks to bridge this gap by building a mathematical foundation for DRL that leverages ideas from approximation theory, control theory, and optimization theory. This will allow the computational and statistical complexity of DRL to be systematically characterized, and will help with designing more efficient and reliable empirical methods. Education and outreach plans are integrated into this project. Specifically, the investigators will mentor graduate and undergraduate students (some through the STARS program for underrepresented groups at the University of washington), develop new courses and monographs, organize research workshops, and develop course materials for a high school data science and artificial intelligence curriculum. This project has three major components. The first thrust identifies which types of guarantees are achievable by policies for different reinforcement learning problem instances. Concretely, this requires investigating how increasingly structured problem instances enable stronger guarantees for policies; this will be done by using, and further developing, tools from non-convex optimization to describe policies that achieve stationary points, local maxima, and global maxima of the reward function. The second thrust takes the perspective of approximation theory and capacity control to investigate how the neural network complexity can be gradually increased to eventually find the most complex sub-family of neural networks that permit sample-efficient algorithms. The third thrust builds upon the knowledge gained in the first two thrusts, and is devoted to the design of computationally efficient algorithms; this will be done by leveraging tools from optimization theory and by making connections with control theory.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度强化学习(DRL)使用神经网络来解决顺序决策问题,在机器人、游戏、医疗保健和交通系统等现实应用中取得了突破。然而,目前关于强化学习的理论工作仅限于少数状态的问题;由于这些结果不包括神经网络,因此它们不能用来令人满意地解释DRL的经验成功。该项目旨在通过构建DRL的数学基础来弥合这一差距,该基础利用了近似理论,控制理论和优化理论的思想。这将使DRL的计算和统计复杂性得到系统的表征,并将有助于设计更有效和可靠的经验方法。教育和外联计划已纳入该项目。具体来说,研究人员将指导研究生和本科生(其中一些是通过华盛顿大学针对代表性不足群体的STARS计划)、开发新课程和专著、组织研究研讨会以及开发高中数据的课程材料科学和人工智能课程。该项目有三个主要组成部分。第一个推力确定了哪些类型的保证是可实现的不同的强化学习问题实例的政策。具体地说,这需要研究日益结构化的问题实例如何为策略提供更强的保证;这将通过使用和进一步开发非凸优化工具来描述实现奖励函数的稳定点,局部最大值和全局最大值的策略来完成。第二个推力的角度近似理论和容量控制,研究如何神经网络的复杂性可以逐渐增加,最终找到最复杂的神经网络,允许样本有效的算法的子家族。第三个重点是建立在前两个重点所获得的知识基础上,致力于设计计算效率高的算法;这将通过利用优化理论的工具和与控制理论的联系来实现。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tengyu Ma其他文献
On the Performance of Thompson Sampling on Logistic Bandits
汤普森采样对Logistic Bandits的性能研究
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Shi Dong;Tengyu Ma;Benjamin Van Roy - 通讯作者:
Benjamin Van Roy
Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms
使用平方和算法分解超完备三阶张量
- DOI:
10.4230/lipics.approx-random.2015.829 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Rong Ge;Tengyu Ma - 通讯作者:
Tengyu Ma
Material Parameters in the GTN Model for Ductile Fracture Simulation of G20Mn5QT Cast Steels
G20Mn5QT 铸钢延性断裂模拟的 GTN 模型中的材料参数
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:3.2
- 作者:
Yue Yin;Tengyu Ma;Q. Han;Yan Lu;Y. Zhang - 通讯作者:
Y. Zhang
Learning Over-Parametrized Two-Layer Neural Networks beyond NTK
学习 NTK 之外的超参数化两层神经网络
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Yuanzhi Li;Tengyu Ma;Hongyang Zhang - 通讯作者:
Hongyang Zhang
Mission-Oriented Networks Robustness Based on Cascade Model
基于级联模型的面向任务的网络鲁棒性
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Tengyu Ma;F. Yang;Chao Chang;Jun Huang - 通讯作者:
Jun Huang
Tengyu Ma的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Tengyu Ma', 18)}}的其他基金
Collaborative Research: RI:Medium:MoDL:Mathematical and Conceptual Understanding of Large Language Models
合作研究:RI:Medium:MoDL:大型语言模型的数学和概念理解
- 批准号:
2211780 - 财政年份:2022
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: Toward a Comprehensive Generalization Theory for Deep Learning
职业:走向深度学习的综合泛化理论
- 批准号:
2045685 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402815 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Mathematical and Algorithmic Foundations of Multi-Task Learning
协作研究:CIF:小型:多任务学习的数学和算法基础
- 批准号:
2343599 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Mathematical and Algorithmic Foundations of Multi-Task Learning
协作研究:CIF:小型:多任务学习的数学和算法基础
- 批准号:
2343600 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402817 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: NSF-AoF: CIF: Small: AI-assisted Waveform and Beamforming Design for Integrated Sensing and Communication
合作研究:NSF-AoF:CIF:小型:用于集成传感和通信的人工智能辅助波形和波束成形设计
- 批准号:
2326622 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402816 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403123 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: NSF-AoF: CIF: Small: AI-assisted Waveform and Beamforming Design for Integrated Sensing and Communication
合作研究:NSF-AoF:CIF:小型:用于集成传感和通信的人工智能辅助波形和波束成形设计
- 批准号:
2326621 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Versatile Data Synchronization: Novel Codes and Algorithms for Practical Applications
合作研究:CIF:小型:多功能数据同步:实际应用的新颖代码和算法
- 批准号:
2312872 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant