New Algorithms for Markov Decision Processes and Reinforcement Learning
马尔可夫决策过程和强化学习的新算法
基本信息
- 批准号:2208163
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-01 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Markov decision processes and reinforcement learning have had significant recent success in applications, ranging from outperforming humans in Atari games to AlphaFold overshadowing competing methods in predicting protein folding. This success results from several fundamental developments, including deep neural networks providing a powerful mechanism for representing high dimensional functions, unprecedented computing power provided by graphical processing units and tensor processing units, and the development of novel algorithms for both prediction and control. However, there are still many challenges in applying these recent techniques to mission-critical applications in health, social and economic planning, and defense. This project aims to develop and analyze novel algorithms for Markov decision processes and reinforcement learning with the intention of making these approaches more broadly applicable. Educational impacts include postdoctoral and graduate student training, as well as undergraduate course development centered around machine learning. This project involves the development of a unified framework for Markov decision processes based on linear programming, where the primal, dual, and primal-dual problems are studied for both the regularized and non-regularized cases. Existing algorithms based on Markov decision processes will then be connected to this unified framework. For the tabular setting, a quasi-Newton type policy gradient algorithm will be developed for general entropic regularizers. For the primal-dual problem, a rapidly converging gradient ascent descent algorithm based on a strictly convexified formulation with a non-standard preconditioning metric will be developed. The nonlinear approximation setting will be addressed by variational actor-critic algorithms that are stable and converge at least to a local minimum. Finally, to address the double sampling issue, new algorithms based on the borrowing-from-the-future idea will be developed to significantly reduce the bias.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
马尔可夫决策过程和强化学习最近在应用中取得了巨大的成功,从Atari游戏中超越人类到AlphaFold在预测蛋白质折叠方面超越竞争方法。这一成功得益于几个基本的发展,包括深度神经网络提供了一种强大的机制来表示高维函数,图形处理单元和张量处理单元提供了前所未有的计算能力,以及用于预测和控制的新算法的开发。然而,在将这些最新技术应用于健康,社会和经济规划以及国防等关键任务应用方面仍然存在许多挑战。该项目旨在开发和分析马尔可夫决策过程和强化学习的新算法,旨在使这些方法更广泛地适用。教育影响包括博士后和研究生培训,以及以机器学习为中心的本科课程开发。 该项目涉及基于线性规划的马尔可夫决策过程的统一框架的开发,其中研究了正则化和非正则化情况下的原始,对偶和原始-对偶问题。现有的算法基于马尔可夫决策过程,然后将连接到这个统一的框架。对于表格设置,拟牛顿型的政策梯度算法将开发一般熵正则化。对于原始对偶问题,将开发一种基于严格凸化公式和非标准预处理度量的快速收敛梯度上升下降算法。 非线性近似设置将由变分演员评论家算法,是稳定的,至少收敛到一个局部最小值。最后,为了解决双重抽样问题,将开发基于未来预测理念的新算法,以显著减少偏差。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lexing Ying其他文献
Multidimensional unstructured sparse recovery via eigenmatrix
通过特征矩阵进行多维非结构化稀疏恢复
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Lexing Ying - 通讯作者:
Lexing Ying
Fast Spatial Gaussian Process Maximum Likelihood Estimation via Skeletonization Factorizations
通过骨架分解的快速空间高斯过程最大似然估计
- DOI:
10.1137/17m1116477 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Victor Minden;Anil Damle;Kenneth L. Ho;Lexing Ying - 通讯作者:
Lexing Ying
On efficient quantum block encoding of pseudo-differential operators
伪微分算子的高效量子块编码
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:6.4
- 作者:
Haoya Li;Hongkang Ni;Lexing Ying - 通讯作者:
Lexing Ying
Quantum Hamiltonian Learning for the Fermi-Hubbard Model
费米-哈伯德模型的量子哈密顿学习
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Hongkang Ni;Haoya Li;Lexing Ying - 通讯作者:
Lexing Ying
On Lyapunov functions and particle methods for regularized minimax problems
- DOI:
10.1007/s40687-022-00315-5 - 发表时间:
2022-03 - 期刊:
- 影响因子:1.2
- 作者:
Lexing Ying - 通讯作者:
Lexing Ying
Lexing Ying的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lexing Ying', 18)}}的其他基金
Tensor Network Computation: Representations, Algebra, and Applications
张量网络计算:表示、代数和应用
- 批准号:
1818449 - 财政年份:2018
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Effective Preconditioners for High Frequency Wave Equations
高频波动方程的有效预调节器
- 批准号:
1521830 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CDI-Type I: Collaborative Research: High-Dimensional Phase-Space Subdivisions for Seismic Imaging
CDI-Type I:协作研究:地震成像的高维相空间细分
- 批准号:
1327658 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER: Fast Algorithms for Oscillatory Integrals
职业:振荡积分的快速算法
- 批准号:
1328230 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CDI-Type I: Collaborative Research:High-dimensional phase-space subdivisions for seismic imaging
CDI-I 型:协作研究:地震成像的高维相空间细分
- 批准号:
1027952 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER: Fast Algorithms for Oscillatory Integrals
职业:振荡积分的快速算法
- 批准号:
0846501 - 财政年份:2009
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: Wave Computations in Phase-Space
合作研究:相空间波计算
- 批准号:
0708014 - 财政年份:2007
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似海外基金
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
- 批准号:
2340586 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Towards Tight Guarantees of Markov Chain Sampling Algorithms in High Dimensional Statistical Inference
职业:高维统计推断中马尔可夫链采样算法的严格保证
- 批准号:
2237322 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Markov chain Monte Carlo algorithms and locally informed proposal distributions
马尔可夫链蒙特卡罗算法和本地通知的提案分布
- 批准号:
RGPIN-2019-04488 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Markov chain Monte Carlo algorithms and locally informed proposal distributions
马尔可夫链蒙特卡罗算法和本地通知的提案分布
- 批准号:
RGPIN-2019-04488 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Functional Analysis of Markov Chain Monte Carlo algorithms
马尔可夫链蒙特卡罗算法的功能分析
- 批准号:
2597521 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Studentship
New Algorithms and Analyses for Partially Observable Markov Decision Processes
部分可观察马尔可夫决策过程的新算法和分析
- 批准号:
RGPIN-2014-04979 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Collaborative Research: AF: Medium: Markov Chain Algorithms for Problems from Computer Science, Statistical Physics and Self-Organizing Particle Systems
合作研究:AF:中:计算机科学、统计物理和自组织粒子系统问题的马尔可夫链算法
- 批准号:
2106917 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Markov Chain Algorithms for Problems from Computer Science, Statistical Physics and Self-Organizing Particle Systems
合作研究:AF:中:计算机科学、统计物理和自组织粒子系统问题的马尔可夫链算法
- 批准号:
2106687 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Markov chain Monte Carlo algorithms and locally informed proposal distributions
马尔可夫链蒙特卡罗算法和本地通知的提案分布
- 批准号:
RGPIN-2019-04488 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual