Addressing Algorithmic Unreliability and Dataset Shift in EHR-based Risk Prediction Models

解决基于 EHR 的风险预测模型中的算法不可靠性和数据集转移

基本信息

  • 批准号:
    10679376
  • 负责人:
  • 金额:
    $ 4.77万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-06-01 至 2026-05-31
  • 项目状态:
    未结题

项目摘要

Project Summary Predictive analytic algorithms built on electronic health record (EHR) inputs, such as patient characteristics, administrative codes, and lab values, are increasingly used in health care settings to direct resources to high- risk patients. Data play an indispensable role in the development and deployment of effective predictive models. The greatest, yet understudied, challenge in the maintenance of these tools arises from a data-related concern, namely dataset shift, in which training data distribution differs from the population on which the algorithm is deployed, leading to model deterioration and inaccurate risk predictions. Dataset shift is a pervasive cause of algorithmic unreliability in EHR-based models due to inevitable changes in physician behaviors and health system operations that alter (1) the input distribution (covariate drift); and (2) changes in the relationship between predictors and outcome (concept drift). Sudden changes in healthcare utilization during the COVID-19 pandemic may have impacted the data generation process and the performance of clinical predictive models. Our preliminary study showed that decreased collection of patient labs during the COVID-19 quarantine period led to sparse data generation for important predictors of a single-institution EHR-based mortality risk prediction algorithm, underpredicting risk for patients with advanced cancers. Despite the increasing use of predictive tools in high stakes clinical applications; and growing recognition of dataset shift, we lack a framework for reasoning shift and its effects on care delivery; and for proactively addressing shift to maintain performance over time. In Aim 1, we propose to extend prior works on shift to a nationally deployed risk prediction algorithm, the VA Care Assessment Need (CAN) model, used on millions of VA beneficiaries each year. The VA CAN model predicts the likelihood of hospitalization within 90 days or 1 year after a primary care encounter to identify high-risk patients who would benefit from additional outpatient interventions. We also investigate covariate and concept drift as two possible mechanisms for COVID-19 associated dataset shift. In Aim 2, we apply an interrupted time series design to study the association between sudden shift at the onset of the pandemic on case-management decisions. Current solutions to address dataset shift have primarily been reactive (i.e. model retraining with recent data), however, fail to be robust in new testing environments. In Aim 3, we consider revision of the VA CAN model via machine learning and inclusion of variables that reflect potential drivers of shift. This project is innovative as it is the first to leverage a rigorous statistical framework to study extent and mechanisms of shift and develop proactive guidelines for model maintenance. The training plan is rigorous for Ms. Kolla, an MD-PhD student in biostatistics. She is strongly supported by her department and institution as well as her two high- qualified sponsors: Dr. Jinbo Chen, an expert in EHR-based risk prediction modeling, and Dr. Ravi Parikh, an expert in implementation of predictive analytics. The proposed research and career development plan will be an essential step towards Ms. Kolla’s development as an interdisciplinary and independent physician-scientist.
项目摘要 基于电子健康记录(EHR)输入的预测分析算法,例如患者特征, 行政代码和实验室值,越来越多地用于医疗保健环境,以指导资源的高- 风险患者。数据在开发和部署有效的预测模型方面发挥着不可或缺的作用。 在维护这些工具方面最大的、但研究不足的挑战来自一个与数据有关的问题, 即数据集移位,其中训练数据分布不同于算法所基于的群体。 部署,导致模型恶化和不准确的风险预测。数据集迁移是导致 由于医生行为和健康的不可避免的变化,基于EHR的模型中的算法不可靠性 改变(1)输入分布(协变量漂移)的系统操作;以及(2) 预测因素和结果(概念漂移)。2019冠状病毒病大流行期间医疗保健利用的突然变化 可能影响了数据生成过程和临床预测模型的性能。我们 初步研究表明,在COVID-19隔离期间,患者实验室收集减少, 稀疏数据生成的重要预测的一个单一的机构EHR为基础的死亡风险预测 算法,低估了晚期癌症患者的风险。尽管预测工具的使用越来越多 在高风险的临床应用中,以及对数据集变化的日益认识,我们缺乏推理框架 转变及其对护理提供的影响;以及积极应对转变,以保持长期业绩。在 目标1,我们建议将先前的工作扩展到全国部署的风险预测算法,VA Care 评估需求(CAN)模型,每年用于数百万VA受益人。模型预测 在初级保健就诊后90天或1年内住院的可能性,以确定高风险 患者将受益于额外的门诊干预。我们还研究了协变量和概念 漂移是COVID-19相关数据集漂移的两种可能机制。在目标2中,我们应用中断时间 研究大流行开始时突然转变与病例管理之间关系的系列设计 决策当前解决数据集偏移的解决方案主要是反应式的(即, 然而,最近的数据)在新的测试环境中不能是鲁棒的。在目标3中,我们考虑修订VA CAN模型通过机器学习和包含反映转变潜在驱动因素的变量。这个项目是 创新,因为它是第一个利用严格的统计框架来研究变化的程度和机制 并为模型维护制定前瞻性指导方针。培训计划是严格的科拉女士,一个MD-PhD 生物统计学专业学生她得到了她所在部门和机构的大力支持,以及她的两个高- 合格的赞助商:陈金波博士,基于EHR的风险预测模型专家,Ravi Parikh博士, 预测分析实施专家。拟议的研究和职业发展计划将是一个 这是Kolla女士作为一名跨学科和独立的医生-科学家发展的重要一步。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Likhitha Kolla其他文献

Likhitha Kolla的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.77万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了