CIF: Small: Reinforcement Learning with Function Approximation: Convergent Algorithms and Finite-sample Analysis
CIF:小型:带有函数逼近的强化学习:收敛算法和有限样本分析
基本信息
- 批准号:2007783
- 负责人:
- 金额:$ 33万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The recent success of a machine-learning technique called reinforcement learning in benchmark tasks suggests a potential revolutionary advance in practical applications, and has dramatically boosted the interest in this technique. However, common algorithms that use this approach are highly data-inefficient, leading to impressive results only on simulated systems, where an infinite amount of data can be simulated. For example, for online tasks that most humans pick up within a few minutes, reinforcement learning algorithms take much longer to reach human-level performance. A good reinforcement learning algorithm called "Rainbow deep Q-network" needs about 18 million frames of simulation data to beat human in performance for the simplest of online tasks. This amount of data corresponds to about 80 person-hours of online experience. This level of data requirements limits the application of reinforcement learning algorithms in many practical applications that only have a limited amount of data. Theoretical understanding of how much data is needed for effective reinforcement learning is still very limited. This project aims to reduce the data requirements to train reinforcement learning algorithms by developing a comprehensive methodology for reinforcement learning algorithm design and analyzing convergence rates, which will in turn motivate design of fast and stable reinforcement learning algorithms. This project will have a direct impact on various engineering and science applications, e.g., the financial market, business strategy planning, industrial automation and online advertising.This project will take a fresh perspective of using tools and concepts from both optimization and reinforcement learning. The following thrusts will be investigated in an increasing order of difficulty. 1) Linear function approximation: tools and insights will be developed to tackle challenges of non-smoothness and non-convexity in control problems. 2) General function approximation: new challenge of non-linearity will be addressed. 3) Neural function approximation: convergence to globally and/or universally optimal solutions will be investigated. In each of the three thrusts, new algorithms will be designed, and their convergence rates will be characterized. These results will be further used as guideline for parameter tuning, and to motivate design of fast and convergent algorithms.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在基准任务中,一种称为强化学习的机器学习技术的成功表明,在实际应用中可能取得了革命性的进步,并极大地增强了人们对这一技术的兴趣。但是,使用这种方法的常见算法具有高度的数据态度,只能在模拟系统上产生令人印象深刻的结果,在模拟系统上可以模拟无限量的数据。例如,对于大多数人类在几分钟内接收的在线任务,增强学习算法需要更长的时间才能达到人类水平的表现。一种称为“ Rainbow Deep Q-Network”的良好强化学习算法需要大约1800万帧的仿真数据,以击败人的性能,以实现最简单的在线任务。这一数量的数据对应于大约80个小时的在线体验。这种数据要求限制了在许多仅具有有限数据的实际应用中,加强学习算法的应用。对有效加强学习需要多少数据的理论理解仍然非常有限。该项目旨在通过开发一种用于增强学习算法设计和分析收敛速率的全面方法来减少培训增强学习算法的数据需求,这反过来又可以激发快速,稳定的增强学习算法的设计。该项目将直接影响各种工程和科学应用程序,例如金融市场,业务战略计划,工业自动化和在线广告。该项目将对使用优化和强化学习的工具和概念有了新的视角。以下推力将以越来越多的困难顺序进行研究。 1)线性函数近似:将开发工具和见解,以应对控制问题中的非平滑度和非凸性的挑战。 2)一般函数近似:将解决非线性的新挑战。 3)神经功能近似:将研究与全球和/或普遍最佳解决方案的收敛。在这三个推力中的每一个中,都将设计新的算法,并将其收敛速率表征。这些结果将进一步用作参数调整的指南,并激发快速和收敛算法的设计。该奖项反映了NSF的法定任务,并使用基金会的智力优点和更广泛的影响评估审查标准,认为值得通过评估来获得支持。
项目成果
期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Policy Gradient Method For Robust Reinforcement Learning
- DOI:10.48550/arxiv.2205.07344
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:Yue Wang;Shaofeng Zou
- 通讯作者:Yue Wang;Shaofeng Zou
Robust Average-Reward Markov Decision Processes
鲁棒平均奖励马尔可夫决策过程
- DOI:10.1609/aaai.v37i12.26775
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Wang, Yue;Velasquez, Alvaro;Atia, George;Prater-Bennette, Ashley;Zou, Shaofeng
- 通讯作者:Zou, Shaofeng
A Robust and Constrained Multi-Agent Reinforcement Learning Electric Vehicle Rebalancing Method in AMoD Systems
- DOI:10.1109/iros55552.2023.10342342
- 发表时间:2022-09
- 期刊:
- 影响因子:0
- 作者:Sihong He;Yue Wang;Shuo Han;Shaofeng Zou;Fei Miao
- 通讯作者:Sihong He;Yue Wang;Shuo Han;Shaofeng Zou;Fei Miao
Model-Free Robust Average-Reward Reinforcement Learning
- DOI:10.48550/arxiv.2305.10504
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Yue Wang;Alvaro Velasquez;George K. Atia;Ashley Prater-Bennette;Shaofeng Zou
- 通讯作者:Yue Wang;Alvaro Velasquez;George K. Atia;Ashley Prater-Bennette;Shaofeng Zou
Data-Driven Robust Multi-Agent Reinforcement Learning
- DOI:10.1109/mlsp55214.2022.9943500
- 发表时间:2022-08
- 期刊:
- 影响因子:0
- 作者:Yudan Wang;Yue Wang;Yi Zhou;Alvaro Velasquez;Shaofeng Zou
- 通讯作者:Yudan Wang;Yue Wang;Yi Zhou;Alvaro Velasquez;Shaofeng Zou
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shaofeng Zou其他文献
Model-Free Robust Reinforcement Learning with Sample Complexity Analysis
具有样本复杂性分析的无模型鲁棒强化学习
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Yudan Wang;Shaofeng Zou;Yue Wang - 通讯作者:
Yue Wang
Layered decoding and secrecy over degraded broadcast channels
降级广播信道的分层解码和保密
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Shaofeng Zou;Yingbin Liang;L. Lai;S. Shamai - 通讯作者:
S. Shamai
Asymptotic optimality of D-CuSum for quickest change detection under transient dynamics
D-CuSum 的渐近最优性用于瞬态动态下最快的变化检测
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Shaofeng Zou;Georgios Fellouris;V. Veeravalli - 通讯作者:
V. Veeravalli
Broadcast Networks With Layered Decoding and Layered Secrecy: Theory and Applications
具有分层解码和分层保密的广播网络:理论与应用
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:20.6
- 作者:
Shaofeng Zou;Yingbin Liang;L. Lai;H. Poor;S. Shamai - 通讯作者:
S. Shamai
K-user degraded broadcast channel with secrecy outside a bounded range
K 用户降级广播信道,其保密性超出有限范围
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Shaofeng Zou;Yingbin Liang;L. Lai;H. Poor;S. Shamai - 通讯作者:
S. Shamai
Shaofeng Zou的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Shaofeng Zou', 18)}}的其他基金
CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
- 批准号:
2337375 - 财政年份:2024
- 资助金额:
$ 33万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Medium: Emerging Directions in Robust Learning and Inference
协作研究:CIF:媒介:稳健学习和推理的新兴方向
- 批准号:
2106560 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Continuing Grant
CCSS: Collaborative Research: Quickest Threat Detection in Adversarial Sensor Networks
CCSS:协作研究:对抗性传感器网络中最快的威胁检测
- 批准号:
2112693 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CRII: CIF: Dynamic Network Event Detection with Time-Series Data
CRII:CIF:使用时间序列数据进行动态网络事件检测
- 批准号:
1948165 - 财政年份:2020
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
相似国自然基金
SERT-nNOS蛋白相互作用的结构基础及其小分子互作抑制剂的设计、合成及快速抗抑郁活性研究
- 批准号:82373728
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
APOE调控小胶质细胞脂代谢模式在ASD认知和社交损伤中的作用及机制研究
- 批准号:82373597
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
小胶质细胞外泌体通过miR-486抑制神经元铁死亡介导电针修复脊髓损伤的机制研究
- 批准号:82360454
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
CUL4B正反馈调控FOXO3a-FOXM1通路促进非小细胞肺癌放疗抵抗的机制研究
- 批准号:82360584
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
葡萄糖饥饿条件下AMPK-CREB-PPA1信号通路促进非小细胞肺癌细胞增殖的分子机制研究
- 批准号:82360518
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
相似海外基金
CIF: SMALL: Theoretical Foundations of Partially Observable Reinforcement Learning: Minimax Sample Complexity and Provably Efficient Algorithms
CIF:SMALL:部分可观察强化学习的理论基础:最小最大样本复杂性和可证明有效的算法
- 批准号:
2315725 - 财政年份:2023
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CIF: Small: Inverse Reinforcement Learning for Cognitive Sensing
CIF:小:认知感知的逆强化学习
- 批准号:
2312198 - 财政年份:2023
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CIF: Small: How Much of Reinforcement Learning is Gradient Descent?
CIF:小:强化学习中有多少是梯度下降?
- 批准号:
2245059 - 财政年份:2023
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CIF: Small: Adversarially Robust Reinforcement Learning: Attack, Defense, and Analysis
CIF:小型:对抗性鲁棒强化学习:攻击、防御和分析
- 批准号:
2232907 - 财政年份:2023
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CIF: Small: Accelerating Stochastic Approximation for Optimization and Reinforcement Learning
CIF:小型:加速优化和强化学习的随机逼近
- 批准号:
2306023 - 财政年份:2023
- 资助金额:
$ 33万 - 项目类别:
Standard Grant