权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Offline Statistical Reinforcement Learning with Applications in Precision Health

离线统计强化学习在精准健康中的应用

基本信息

批准号：
2113637
负责人：
Wenbin Lu
金额：
$ 20万
依托单位：
North Carolina State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-08-15 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2113637&HistoricalAwards=false
关键词：
Offline Statistical Reinforcement Learning Applications

项目摘要

Precision medicine seeks to tailor medical treatment to the individual characteristics of each patient to achieve the goal of better patient outcomes. As a broader conceptualization that includes precision medicine, precision health involves approaches that everyone can do on their own to protect their health as well as steps that public health can take. Reinforcement Learning (RL) is a powerful technique that allows an agent to learn and take actions in a given environment in order to maximize the cumulative reward that the agent receives. The interest in developing new statistical RL methods for precision health is emerging. The potential impacts of this work can be summarized in the following four goals. First, the project contributes to both the fields of semiparametric inference and RL. The theoretical results include non-asymptotic distribution, risk bounds with novel empirical process technical tools. These results will be fundamentally important and generally applicable for studying semiparametric inference empowered by RL. Second, the clinical findings based on analyzing the electronic medical record (EMR) data will lead to major progress in addressing important clinical questions on the treatment recommendations for patients. Third, although EMR and mobile health (mHealth) data are the main applications of this project, the developed methods are general enough to apply to a variety of data sources including clinical data and economic data. The developed methods are expected to greatly enhance the acquisition and analysis of large-scale data with population heterogeneity for medical, scientific and engineering communities. Fourth, the integration of research and education is a key aspect of this project. The PI will develop new courses and improve existing courses on RL and semiparametric inference, will train graduate students, and will reach out to the K-12 education levels by training high school teachers and students.Despite the tremendous impacts that RL has achieved in areas such as games and robots, a direct deployment of RL algorithms in precision health can be costly, risky or even infeasible, due to significant real-world challenges. The main objective of this proposal is to develop new statistical offline RL methods to handle real-world challenges by developing flexible and efficient off-policy learning and robust and efficient off-policy evaluation methods. The off-policy learning in RL refers to the problem of finding the best target policy that maximizes the value, given samples collected from a possibly different policy. In Aim 1, we will develop an efficient advantage learning framework in order to efficiently use pre-collected data for policy optimization. In Aim 2, we consider the problem of off-policy evaluation where the objective is to learn the value under a target policy with data collected under a possibly different policy. There is a growing literature on estimating the value under a given policy in off-policy settings. However, very limited work have been considered regarding statistical inference such as hypothesis testing and confidence intervals (CIs) of the value, which is the focus of this aim. In Aim 3, we will discuss the plan of analyzing EMR and mHealth data with policy learning and policy evaluation based on the proposed methods in Aims 1 and 2.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

精准医疗寻求为每个患者的个体特征量身定做医疗，以实现更好的患者结果的目标。作为包括精准医学在内的一个更广泛的概念，精准健康包括每个人都可以自己做的方法来保护自己的健康，以及公共卫生可以采取的步骤。强化学习(RL)是一种强大的技术，它允许智能体在给定的环境中学习和采取行动，以最大化智能体获得的累积奖励。人们对开发用于精确健康的新的统计RL方法的兴趣正在显现。这项工作的潜在影响可概括为以下四个目标。首先，该项目对半参数推理和RL两个领域都有贡献。理论结果包括非渐近分布、新的经验过程技术工具的风险界。这些结果对于研究基于RL的半参数推理具有重要意义和普遍意义。其次，基于分析电子病历(EMR)数据的临床结果将导致在解决有关患者治疗建议的重要临床问题方面取得重大进展。第三，虽然电子病历和移动健康(MHealth)数据是该项目的主要应用，但开发的方法足够通用，可以应用于包括临床数据和经济数据在内的各种数据源。所开发的方法有望极大地加强医学、科学和工程界对具有群体异质性的大规模数据的获取和分析。第四，研究和教育的融合是这个项目的一个关键方面。PI将开发新课程并改进现有的RL和半参数推理课程，将培训研究生，并将通过培训高中教师和学生达到K-12教育水平。尽管RL在游戏和机器人等领域取得了巨大影响，但由于现实世界的重大挑战，在精确医疗领域直接部署RL算法可能代价高昂、风险巨大，甚至是不可行的。这项提议的主要目标是开发新的统计离线RL方法，通过开发灵活而有效的非政策学习和稳健而有效的非政策评估方法来应对现实世界的挑战。RL中的非策略学习是指在给定从可能不同的策略收集的样本的情况下，找到最大化价值的最佳目标策略的问题。在目标1中，我们将开发一个高效的优势学习框架，以便有效地使用预先收集的数据进行政策优化。在目标2中，我们考虑了非政策评估问题，其中的目标是利用在可能不同的政策下收集的数据来学习目标政策下的价值。有越来越多的文献关于在非策略环境下估计给定策略下的价值。然而，关于统计推断的工作非常有限，例如假设检验和值的可信区间，这是这一目标的重点。在目标3中，我们将讨论基于目标1和2中建议的方法，通过政策学习和政策评估来分析EMR和mHealth数据的计划。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。