CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
基本信息
- 批准号:2337375
- 负责人:
- 金额:$ 52万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-09-01 至 2029-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Existing reinforcement learning (RL) approaches usually assume that a learned policy will be deployed in the same environment as the one it was trained in. Such an assumption is often violated in practice, due to e.g., adversarial perturbations, modeling error between simulator and real-world applications, non-stationary environment, and limited amount of training data. The discrepancy between the training and test environments gives rise to a model mismatch, which lead to a notable decline in performance and restrict the suitability of RL in crucial domains, e.g., healthcare, critical infrastructure, transportation systems, and smart cities. To address the above challenge, there have been noteworthy efforts to develop distributionally robust RL approaches. This CAREER project aims to advance the fundamental algorithmic and theoretic limits of distributionally robust RL. The research outcome of this project holds the promise to push the algorithmic and theoretical boundaries of robust RL, and will deliver provably convergent, efficient and minimax optimal robust RL algorithms. The project will have a significant impact on theory and practice of sequential decision making in various domains, e.g., special education, intelligent transportation system, wireless communication networks, power systems and drone networks. The activities in this project will provide concrete principles and design guidelines to achieve robustness in face of model uncertainty. The integration of research work into education and outreach will target K-12 educators, graduate, undergraduate and underrepresented students with efforts on (i) Artificial Intelligence (AI) summer camp for K-12 educators; (ii) Buffalo Day workshop; (iii) curriculum development; (iv) student supervision.The research efforts are organized around three complimentary thrusts: (i) Thrust A focuses on developing theoretical and algorithmic foundations for distributionally robust RL under the long-term average-reward criterion. (ii) Thrust B focuses on developing a unified framework of distributional robustness for learning (robust) policies from offline dataset without active data acquisition and exploration, and further uncovering their fundamental limits; (iii) Thrust C focuses on constructive approaches and fundamental limits of robust RL under constraints, i.e., optimizing reward while simultaneously guaranteeing constraints under model uncertainty. This project will develop fundamental understandings of robust RL, minimax optimal robust RL algorithms and novel technical convergence and complexity analyses. The research outcome will significantly improve the robustness of RL algorithms and will be of interest to a broad range of communities, e.g., machine learning, statistics, information theory, networking, communication, power, and education. The proposed work will also foster new interdisciplinary research directions across these research communities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现有的强化学习(RL)方法通常假设学习的策略将部署在与训练它的环境相同的环境中。这种假设在实践中经常被违反,例如,对抗性扰动、模拟器和真实世界应用之间的建模误差、非平稳环境以及有限的训练数据量。训练和测试环境之间的差异导致模型不匹配,这导致性能显著下降,并限制了RL在关键领域的适用性,例如,医疗保健、关键基础设施、交通系统和智慧城市。为了解决上述挑战,已经做出了值得注意的努力来开发分布式鲁棒RL方法。这个CAREER项目旨在推进分布式鲁棒RL的基本算法和理论限制。该项目的研究成果有望推动鲁棒强化学习的算法和理论边界,并将提供可证明收敛的,有效的和minimax最优的鲁棒强化学习算法。该项目将对各个领域的顺序决策的理论和实践产生重大影响,例如,特殊教育、智能交通系统、无线通信网络、电力系统和无人机网络。本项目中的活动将提供具体的原则和设计指南,以实现面对模型不确定性的鲁棒性。将研究工作融入教育和外展将针对K-12教育工作者、研究生、本科生和代表性不足的学生,努力开展以下工作:(i)K-12教育工作者人工智能(AI)夏令营;(ii)布法罗日研讨会;(iii)课程开发;(iv)学生监督。研究工作围绕三个互补的重点组织:(i)目标A重点关注在长期平均回报标准下为分布式鲁棒强化学习开发理论和算法基础。 (ii)目标B侧重于开发一个统一的分布式鲁棒性框架,用于在没有主动数据采集和探索的情况下从离线数据集学习(鲁棒)策略,并进一步揭示其基本限制;(iii)目标C侧重于约束下的鲁棒RL的构造性方法和基本限制,即,在模型不确定的情况下,优化报酬,同时保证约束。本项目将发展对鲁棒RL、极小极大最优鲁棒RL算法以及新技术收敛性和复杂性分析的基本理解。研究成果将显着提高RL算法的鲁棒性,并将引起广泛社区的兴趣,例如,机器学习、统计学、信息论、网络、通信、电力和教育。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shaofeng Zou其他文献
Model-Free Robust Reinforcement Learning with Sample Complexity Analysis
具有样本复杂性分析的无模型鲁棒强化学习
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Yudan Wang;Shaofeng Zou;Yue Wang - 通讯作者:
Yue Wang
Near-infrared quantum cutting in Bi3+/Yb3+ co-doped oxyfluoride glasses via cooperative energy transfer for solar cells
Bi3/Yb3共掺杂氟氧化物玻璃的近红外量子切割通过太阳能电池的协同能量转移
- DOI:
10.1016/j.optmat.2014.10.047 - 发表时间:
2014-12 - 期刊:
- 影响因子:3.9
- 作者:
Weirong Wang;Shaofeng Zou;Xiao Lei;Huiping Gao;Yanli Mao* - 通讯作者:
Yanli Mao*
Nonparametric Anomaly Detection and Secure Communication
非参数异常检测和安全通信
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Shaofeng Zou - 通讯作者:
Shaofeng Zou
An Information Theoretic Approach to Secret Sharing
秘密共享的信息论方法
- DOI:
10.1109/tit.2015.2421905 - 发表时间:
2014 - 期刊:
- 影响因子:2.5
- 作者:
Shaofeng Zou;Yingbin Liang;L. Lai;S. Shamai - 通讯作者:
S. Shamai
A kernel-based nonparametric test for anomaly detection over line networks
用于线路网络异常检测的基于内核的非参数测试
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Shaofeng Zou;Yingbin Liang;H. Poor - 通讯作者:
H. Poor
Shaofeng Zou的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Shaofeng Zou', 18)}}的其他基金
Collaborative Research: CIF: Medium: Emerging Directions in Robust Learning and Inference
协作研究:CIF:媒介:稳健学习和推理的新兴方向
- 批准号:
2106560 - 财政年份:2021
- 资助金额:
$ 52万 - 项目类别:
Continuing Grant
CCSS: Collaborative Research: Quickest Threat Detection in Adversarial Sensor Networks
CCSS:协作研究:对抗性传感器网络中最快的威胁检测
- 批准号:
2112693 - 财政年份:2021
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
CRII: CIF: Dynamic Network Event Detection with Time-Series Data
CRII:CIF:使用时间序列数据进行动态网络事件检测
- 批准号:
1948165 - 财政年份:2020
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
CIF: Small: Reinforcement Learning with Function Approximation: Convergent Algorithms and Finite-sample Analysis
CIF:小型:带有函数逼近的强化学习:收敛算法和有限样本分析
- 批准号:
2007783 - 财政年份:2020
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
相似国自然基金
供应链管理中的稳健型(Robust)策略分析和稳健型优化(Robust Optimization )方法研究
- 批准号:70601028
- 批准年份:2006
- 资助金额:7.0 万元
- 项目类别:青年科学基金项目
心理紧张和应力影响下Robust语音识别方法研究
- 批准号:60085001
- 批准年份:2000
- 资助金额:14.0 万元
- 项目类别:专项基金项目
ROBUST语音识别方法的研究
- 批准号:69075008
- 批准年份:1990
- 资助金额:3.5 万元
- 项目类别:面上项目
改进型ROBUST序贯检测技术
- 批准号:68671030
- 批准年份:1986
- 资助金额:2.0 万元
- 项目类别:面上项目
相似海外基金
CPS: Medium: Collaborative Research: Provably Safe and Robust Multi-Agent Reinforcement Learning with Applications in Urban Air Mobility
CPS:中:协作研究:可证明安全且鲁棒的多智能体强化学习及其在城市空中交通中的应用
- 批准号:
2312092 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
Robust and Efficient Model-based Reinforcement Learning
稳健高效的基于模型的强化学习
- 批准号:
EP/X03917X/1 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Research Grant
Exploring Causality in Reinforcement Learning for Robust Decision Making
探索强化学习中的因果关系以实现稳健决策
- 批准号:
EP/Y003187/1 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Research Grant
CIF: Small: Adversarially Robust Reinforcement Learning: Attack, Defense, and Analysis
CIF:小型:对抗性鲁棒强化学习:攻击、防御和分析
- 批准号:
2232907 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
CPS: Medium: Collaborative Research: Provably Safe and Robust Multi-Agent Reinforcement Learning with Applications in Urban Air Mobility
CPS:中:协作研究:可证明安全且鲁棒的多智能体强化学习及其在城市空中交通中的应用
- 批准号:
2312093 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
CPS: Medium: Collaborative Research: Provably Safe and Robust Multi-Agent Reinforcement Learning with Applications in Urban Air Mobility
CPS:中:协作研究:可证明安全且鲁棒的多智能体强化学习及其在城市空中交通中的应用
- 批准号:
2312094 - 财政年份:2023
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
Robust Decision-Aware Model-based Reinforcement Learning
基于鲁棒决策感知模型的强化学习
- 批准号:
RGPIN-2021-03701 - 财政年份:2022
- 资助金额:
$ 52万 - 项目类别:
Discovery Grants Program - Individual
CAREER: Soft-robust Methods for Offline Reinforcement Learning
职业:离线强化学习的软鲁棒方法
- 批准号:
2144601 - 财政年份:2022
- 资助金额:
$ 52万 - 项目类别:
Continuing Grant
Distributionally Robust Adaptive Control: Enabling Safe and Robust Reinforcement Learning
分布式鲁棒自适应控制:实现安全鲁棒的强化学习
- 批准号:
2135925 - 财政年份:2022
- 资助金额:
$ 52万 - 项目类别:
Standard Grant
CDS&E: Reinforcement learning for robust wall models in large-eddy simulations
CDS
- 批准号:
2152705 - 财政年份:2022
- 资助金额:
$ 52万 - 项目类别:
Standard Grant