权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Addressing Algorithmic Unreliability and Dataset Shift in EHR-based Risk Prediction Models

解决基于 EHR 的风险预测模型中的算法不可靠性和数据集转移

基本信息

批准号：
10679376
负责人：
Likhitha Kolla
金额：
$ 4.77万
依托单位：
UNIVERSITY OF PENNSYLVANIA
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-06-01 至 2026-05-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10679376
关键词：
Accident and Emergency department Address Advanced Malignant Neoplasm Algorithms Biometry COVID-19 COVID-19 pandemic Caring Case Management Cessation of life Characteristics Clinical Code Collection Compensation Data Data Set Data Sources Deterioration Development Development Plans Early Intervention Electronic Health Record Environment Future Generations Guidelines Health Health system Healthcare Hospital Costs Hospitalization Impairment Incentives Inpatients Institution Interruption Intervention Knowledge Laboratories Machine Learning Maintenance Methods Mission Modeling Needs Assessment Onset of illness Outcome Outpatients Patient Care Patients Performance Physicians Play Policies Policy Maker Population Predictive Analytics Primary Care Process Public Health Qualifying Quarantine Refit Reporting Research Research Personnel Resource Allocation Resources Risk Adjustment Role Scientist Sepsis Series System Testing Time Training Triage United States National Institutes of Health Validation Veterans Health Administration Work analytical tool beneficiary care delivery career development clinical application clinical decision-making clinical predictive model demographics design doctoral student evidence base health care disparity health care service utilization health care settings health service use high risk improved innovation model design model development mortality risk operation outpatient programs pandemic disease pandemic impact prediction algorithm predictive modeling predictive tools prevent provider behavior research and development risk prediction risk prediction model tool

项目摘要

Project Summary Predictive analytic algorithms built on electronic health record (EHR) inputs, such as patient characteristics, administrative codes, and lab values, are increasingly used in health care settings to direct resources to high- risk patients. Data play an indispensable role in the development and deployment of effective predictive models. The greatest, yet understudied, challenge in the maintenance of these tools arises from a data-related concern, namely dataset shift, in which training data distribution differs from the population on which the algorithm is deployed, leading to model deterioration and inaccurate risk predictions. Dataset shift is a pervasive cause of algorithmic unreliability in EHR-based models due to inevitable changes in physician behaviors and health system operations that alter (1) the input distribution (covariate drift); and (2) changes in the relationship between predictors and outcome (concept drift). Sudden changes in healthcare utilization during the COVID-19 pandemic may have impacted the data generation process and the performance of clinical predictive models. Our preliminary study showed that decreased collection of patient labs during the COVID-19 quarantine period led to sparse data generation for important predictors of a single-institution EHR-based mortality risk prediction algorithm, underpredicting risk for patients with advanced cancers. Despite the increasing use of predictive tools in high stakes clinical applications; and growing recognition of dataset shift, we lack a framework for reasoning shift and its effects on care delivery; and for proactively addressing shift to maintain performance over time. In Aim 1, we propose to extend prior works on shift to a nationally deployed risk prediction algorithm, the VA Care Assessment Need (CAN) model, used on millions of VA beneficiaries each year. The VA CAN model predicts the likelihood of hospitalization within 90 days or 1 year after a primary care encounter to identify high-risk patients who would benefit from additional outpatient interventions. We also investigate covariate and concept drift as two possible mechanisms for COVID-19 associated dataset shift. In Aim 2, we apply an interrupted time series design to study the association between sudden shift at the onset of the pandemic on case-management decisions. Current solutions to address dataset shift have primarily been reactive (i.e. model retraining with recent data), however, fail to be robust in new testing environments. In Aim 3, we consider revision of the VA CAN model via machine learning and inclusion of variables that reflect potential drivers of shift. This project is innovative as it is the first to leverage a rigorous statistical framework to study extent and mechanisms of shift and develop proactive guidelines for model maintenance. The training plan is rigorous for Ms. Kolla, an MD-PhD student in biostatistics. She is strongly supported by her department and institution as well as her two high- qualified sponsors: Dr. Jinbo Chen, an expert in EHR-based risk prediction modeling, and Dr. Ravi Parikh, an expert in implementation of predictive analytics. The proposed research and career development plan will be an essential step towards Ms. Kolla’s development as an interdisciplinary and independent physician-scientist.

项目摘要基于电子健康记录（EHR）输入的预测分析算法，例如患者特征，行政代码和实验室值，越来越多地用于医疗保健环境，以指导资源的高- 风险患者。数据在开发和部署有效的预测模型方面发挥着不可或缺的作用。在维护这些工具方面最大的、但研究不足的挑战来自一个与数据有关的问题，即数据集移位，其中训练数据分布不同于算法所基于的群体。部署，导致模型恶化和不准确的风险预测。数据集迁移是导致由于医生行为和健康的不可避免的变化，基于EHR的模型中的算法不可靠性改变（1）输入分布（协变量漂移）的系统操作;以及（2）预测因素和结果（概念漂移）。2019冠状病毒病大流行期间医疗保健利用的突然变化可能影响了数据生成过程和临床预测模型的性能。我们初步研究表明，在COVID-19隔离期间，患者实验室收集减少，稀疏数据生成的重要预测的一个单一的机构EHR为基础的死亡风险预测算法，低估了晚期癌症患者的风险。尽管预测工具的使用越来越多在高风险的临床应用中，以及对数据集变化的日益认识，我们缺乏推理框架转变及其对护理提供的影响;以及积极应对转变，以保持长期业绩。在目标1，我们建议将先前的工作扩展到全国部署的风险预测算法，VA Care 评估需求（CAN）模型，每年用于数百万VA受益人。模型预测在初级保健就诊后90天或1年内住院的可能性，以确定高风险患者将受益于额外的门诊干预。我们还研究了协变量和概念漂移是COVID-19相关数据集漂移的两种可能机制。在目标2中，我们应用中断时间研究大流行开始时突然转变与病例管理之间关系的系列设计决策当前解决数据集偏移的解决方案主要是反应式的（即，然而，最近的数据）在新的测试环境中不能是鲁棒的。在目标3中，我们考虑修订VA CAN模型通过机器学习和包含反映转变潜在驱动因素的变量。这个项目是创新，因为它是第一个利用严格的统计框架来研究变化的程度和机制并为模型维护制定前瞻性指导方针。培训计划是严格的科拉女士，一个MD-PhD 生物统计学专业学生她得到了她所在部门和机构的大力支持，以及她的两个高- 合格的赞助商：陈金波博士，基于EHR的风险预测模型专家，Ravi Parikh博士，预测分析实施专家。拟议的研究和职业发展计划将是一个这是Kolla女士作为一名跨学科和独立的医生-科学家发展的重要一步。