权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits

职业：模型不确定性下的鲁棒强化学习：算法和基本限制

基本信息

批准号：
2337375
负责人：
Shaofeng Zou
金额：
$ 52万
依托单位：
SUNY at Buffalo
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-09-01 至 2029-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2337375&HistoricalAwards=false
关键词：
CAREER Robust Reinforcement Learning Under

项目摘要

Existing reinforcement learning (RL) approaches usually assume that a learned policy will be deployed in the same environment as the one it was trained in. Such an assumption is often violated in practice, due to e.g., adversarial perturbations, modeling error between simulator and real-world applications, non-stationary environment, and limited amount of training data. The discrepancy between the training and test environments gives rise to a model mismatch, which lead to a notable decline in performance and restrict the suitability of RL in crucial domains, e.g., healthcare, critical infrastructure, transportation systems, and smart cities. To address the above challenge, there have been noteworthy efforts to develop distributionally robust RL approaches. This CAREER project aims to advance the fundamental algorithmic and theoretic limits of distributionally robust RL. The research outcome of this project holds the promise to push the algorithmic and theoretical boundaries of robust RL, and will deliver provably convergent, efficient and minimax optimal robust RL algorithms. The project will have a significant impact on theory and practice of sequential decision making in various domains, e.g., special education, intelligent transportation system, wireless communication networks, power systems and drone networks. The activities in this project will provide concrete principles and design guidelines to achieve robustness in face of model uncertainty. The integration of research work into education and outreach will target K-12 educators, graduate, undergraduate and underrepresented students with efforts on (i) Artificial Intelligence (AI) summer camp for K-12 educators; (ii) Buffalo Day workshop; (iii) curriculum development; (iv) student supervision.The research efforts are organized around three complimentary thrusts: (i) Thrust A focuses on developing theoretical and algorithmic foundations for distributionally robust RL under the long-term average-reward criterion. (ii) Thrust B focuses on developing a unified framework of distributional robustness for learning (robust) policies from offline dataset without active data acquisition and exploration, and further uncovering their fundamental limits; (iii) Thrust C focuses on constructive approaches and fundamental limits of robust RL under constraints, i.e., optimizing reward while simultaneously guaranteeing constraints under model uncertainty. This project will develop fundamental understandings of robust RL, minimax optimal robust RL algorithms and novel technical convergence and complexity analyses. The research outcome will significantly improve the robustness of RL algorithms and will be of interest to a broad range of communities, e.g., machine learning, statistics, information theory, networking, communication, power, and education. The proposed work will also foster new interdisciplinary research directions across these research communities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现有的强化学习（RL）方法通常假设学习的策略将部署在与训练它的环境相同的环境中。这种假设在实践中经常被违反，例如，对抗性扰动、模拟器和真实世界应用之间的建模误差、非平稳环境以及有限的训练数据量。训练和测试环境之间的差异导致模型不匹配，这导致性能显著下降，并限制了RL在关键领域的适用性，例如，医疗保健、关键基础设施、交通系统和智慧城市。为了解决上述挑战，已经做出了值得注意的努力来开发分布式鲁棒RL方法。这个CAREER项目旨在推进分布式鲁棒RL的基本算法和理论限制。该项目的研究成果有望推动鲁棒强化学习的算法和理论边界，并将提供可证明收敛的，有效的和minimax最优的鲁棒强化学习算法。该项目将对各个领域的顺序决策的理论和实践产生重大影响，例如，特殊教育、智能交通系统、无线通信网络、电力系统和无人机网络。本项目中的活动将提供具体的原则和设计指南，以实现面对模型不确定性的鲁棒性。将研究工作融入教育和外展将针对K-12教育工作者、研究生、本科生和代表性不足的学生，努力开展以下工作：（i）K-12教育工作者人工智能（AI）夏令营;（ii）布法罗日研讨会;（iii）课程开发;（iv）学生监督。研究工作围绕三个互补的重点组织：（i）目标A重点关注在长期平均回报标准下为分布式鲁棒强化学习开发理论和算法基础。 (ii)目标B侧重于开发一个统一的分布式鲁棒性框架，用于在没有主动数据采集和探索的情况下从离线数据集学习（鲁棒）策略，并进一步揭示其基本限制;（iii）目标C侧重于约束下的鲁棒RL的构造性方法和基本限制，即，在模型不确定的情况下，优化报酬，同时保证约束。本项目将发展对鲁棒RL、极小极大最优鲁棒RL算法以及新技术收敛性和复杂性分析的基本理解。研究成果将显着提高RL算法的鲁棒性，并将引起广泛社区的兴趣，例如，机器学习、统计学、信息论、网络、通信、电力和教育。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。