CAREER: Theoretical Foundations of Offline Reinforcement Learning

职业：离线强化学习的理论基础

基本信息

批准号：
2141781
负责人：
Nan Jiang
金额：
$ 50万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-05-01 至 2027-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2141781&HistoricalAwards=false
关键词：
CAREER Theoretical Foundations Offline Reinforcement

项目摘要

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Reinforcement learning (RL) is a subarea of Artificial Intelligence (AI) that solves complex decision-making tasks. It has achieved impressive successes in simulator-defined problems, where the RL agent learns via trial-and-error inside a virtual "online" environment. However, it is difficult to apply these online algorithms to real-world problems, as trial-and-error is often expensive or impossible in real life. For example, it is unethical for an RL agent in personalized medicine to test a new treatment strategy that may harm patients, just for the purpose of gathering new information. A promising paradigm to addressing this issue is offline RL, where the agent learns solely from historical data. While the lack of direct interactions with the real environment prevents undesirable real-world consequences, it also gives rise to significant technical challenges in learning. This project aims to develop novel methods to address these challenges and provide a deep theoretical understanding for offline RL, and make significant progress in enabling offline RL in real-life applications such as robotics, adaptive medical treatment, and online recommendation systems. The research development will also be integrated into the project's educational plan, which includes advising underrepresented students and developing new courses and a monograph on reinforcement learning. The technical aims of the project consist of two thrusts. The first thrust focuses on the problem of model selection: after training is completed, how should we select between candidate policies on a holdout dataset? Model selection enables hyperparameter tuning, which is the backbone of practical machine learning, yet it is notoriously difficult in offline RL due to the multi-stage nature of the problem. The proposal describes a promising approach that builds on the investigator's recent theoretical work on value-function selection. The project will devise empirically effective methods based on the theoretical insights and address practical issues such as poorly fitted candidate functions and data with insufficient coverage. The second thrust considers the theoretical foundation of offline RL training: under what conditions can we guarantee the success of training? The proposal lays out the theoretical landscape of offline-RL training, and identifies important open questions and opportunities for discovering novel theoretical and algorithmic insights.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该奖项是根据2021年《美国救援计划法》的全部或部分资助（公共法第117-2）。强化学习（RL）是人工智能（AI）的子地区（AI），可以解决复杂的决策任务。它在模拟器定义的问题中取得了令人印象深刻的成功，在该问题中，RL代理在虚拟“在线”环境中通过反复试验学习。但是，很难将这些在线算法应用于现实世界中的问题，因为在现实生活中试验通常是昂贵或不可能的。例如，个性化医学中的RL特工测试可能损害患者的新治疗策略是不道德的，只是为了收集新信息。解决此问题的一个有希望的范式是离线RL，代理人仅从历史数据中学习。尽管缺乏与真实环境的直接互动可以阻止不良现实世界的后果，但它也引起了学习的重大技术挑战。该项目旨在开发新的方法来应对这些挑战，并为离线RL提供深刻的理论理解，并在实现现实生活应用中的离线RL方面取得了重大进展，例如机器人技术，自适应医疗和在线推荐系统。研究开发还将纳入该项目的教育计划，其中包括为代表性不足的学生提供建议，开发新课程以及有关强化学习的专着。该项目的技术目标包括两个推力。第一个推力重点是模型选择问题：训练完成后，我们应该如何在保留数据集中的候选策略之间进行选择？模型选择可以实现高参数调整，后者是实用机器学习的骨干，但是由于问题的多阶段性质，众所周知，离线RL很难。该提案描述了一种有前途的方法，该方法基于研究者最近在价值功能选择的理论工作。该项目将基于理论见解并解决实践问题的经验有效方法，并解决诸如拟合不佳的候选功能和覆盖范围不足的数据。第二个推力考虑了离线RL培训的理论基础：在什么条件下，我们可以保证培训的成功？该提案列出了离线RL培训的理论格局，并确定了发现新颖的理论和算法洞察力的重要开放问题和机会。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来支持的。