A data science framework for transforming electronic health records into real-world evidence

将电子健康记录转化为现实世界证据的数据科学框架

基本信息

项目摘要

PROJECT SUMMARY Randomized controlled trials (RCTs) are the gold-standard in clinical research but are subject to many limitations including high costs, limited generalizability, and small sample sizes in patient subgroups. By contrast, electronic health records (EHRs) are widely available and contain information on large and representative patient cohorts. However, because they capture the uncontrolled observations of many clinicians, they are highly susceptible to bias. The recent availability of the raw data from RCTs has created a unique opportunity to integrate them with that from EHRs, and to innovate methods that exploit the distinct advantages of each dataset. We propose to identify the zone of overlap between these data and build bridges in data representations. These bridges could enable us to better emulate randomized trials using EHR data and measure the same effects seen in the trials. Consequently, it would allow us to study subgroups that were excluded from the pivotal trials associated with new drug approvals by the FDA. We will test these ideas out in the context of Ulcerative colitis (UC) and scale to others in future work. We have obtained access to the raw data from 12 RCTs in UC (N=6,226). These data contain timed and structured measurements of disease activity including the Mayo score, a composite score of patient symptoms and endoscopic severity. We have also obtained access to the EHR data of 3,270 UC patients treated at the University of California San Francisco. These data contain similar data as RCTs but largely in an unstructured form. In addition, these assessments tend to be incomplete relative to trials due to costs and invasiveness of some tests. We will address this problem of unharmonized and incomplete EHR data in three aims. In Aim 1, we will harmonize the RCT data into an analysis-ready format. We will also develop text classification tools to transform free-texted EHR data into Mayo subscores, and validate these tools against data from a second center. In Aim 2, we will integrate the RCT and EHR data, train algorithms to impute RCT- based representations of the patient state from partial measurements made in EHRs, and test them under conditions typifying real-world data capture. In Aim 3, we will use these algorithms to harmonize EHR data, validate them as a tool to recover the same effects as RCTs, and study new patient subgroups. The applicant will carry out these aims and train in biostatistics, natural language processing, machine learning, and overall career development. With the help of his mentors, he will launch a career dedicated to developing and disseminating methods for learning from complex clinical data, and in so doing, promote a future of better healthcare for all patients.
项目摘要 随机对照试验(RCT)是临床研究的金标准,但受到许多 局限性包括高成本、有限的普遍性和患者亚组中的小样本量。通过 相比之下,电子健康记录(EHR)是广泛可用的, 代表性患者队列。然而,由于它们捕捉了许多不受控制的观察结果, 临床医生,他们非常容易受到偏见的影响。最近从随机对照试验中获得的原始数据创造了一个 独特的机会,将它们与电子健康档案相结合,并创新方法,利用不同的 每个数据集的优点。 我们建议确定这些数据之间的重叠区域,并在数据表示中建立桥梁。 这些桥梁可以使我们能够更好地模拟使用EHR数据的随机试验,并测量相同的 试验中看到的效果。因此,这将使我们能够研究被排除在外的亚组。 与FDA新药批准相关的关键试验。 我们将在溃疡性结肠炎(UC)的背景下测试这些想法,并在未来的工作中扩展到其他人。我们 获得了12项UC RCT的原始数据(N= 6,226)。这些数据包含定时的和结构化的 疾病活动性的测量,包括马约评分、患者症状的综合评分和 内镜严重度。我们还获得了3,270名UC患者的EHR数据, 加州大学旧金山分校弗朗西斯科。这些数据包含与RCT相似的数据,但大部分是非结构化的。 form.此外,这些评估往往是不完整的,相对于试验,由于成本和侵入性, 一些测试。我们将在三个目标中解决EHR数据不协调和不完整的问题。 在目标1中,我们将把RCT数据统一为分析就绪格式。我们还将开发文本 分类工具,将自由文本的EHR数据转换为马约子分数,并根据 数据来自第二个中心。在目标2中,我们将整合RCT和EHR数据,训练算法来估算RCT- 根据EHR中的部分测量结果对患者状态进行表征,并在 典型的现实世界数据捕获条件。在目标3中,我们将使用这些算法来协调EHR数据, 验证它们作为恢复与RCT相同效果的工具,并研究新的患者亚组。 申请人将实现这些目标,并在生物统计学,自然语言处理,机器 学习和整体职业发展。在导师们的帮助下,他将开始一段致力于 开发和传播从复杂的临床数据中学习的方法,并在此过程中促进 为所有患者提供更好的医疗保健。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.
使用大型语言模型从临床记录中算法识别治疗中出现的不良事件:炎症性肠病的初步研究。
  • DOI:
    10.1002/cpt.3226
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    6.7
  • 作者:
    Silverman,AnnaL;Sushil,Madhumita;Bhasuran,Balu;Ludwig,Dana;Buchanan,James;Racz,Rebecca;Parakala,Mahalakshmi;El-Kamary,Samer;Ahima,Ohenewaa;Belov,Artur;Choi,Lauren;Billings,Monisha;Li,Yan;Habal,Nadia;Liu,Qi;Tiwari,Jawahar;B
  • 通讯作者:
    B
Assessing the Impact of COVID-19 on IBD Outcomes Among Vulnerable Patient Populations in a Large Metropolitan Center.
  • DOI:
    10.1093/ibd/izad041
  • 发表时间:
    2023-03
  • 期刊:
  • 影响因子:
    4.9
  • 作者:
    F. Odufalu;Justin L Sewell;Vivek A. Rudrapatna;M. Somsouk;U. Mahadevan
  • 通讯作者:
    F. Odufalu;Justin L Sewell;Vivek A. Rudrapatna;M. Somsouk;U. Mahadevan
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Vivek A Rudrapatna其他文献

Robust measurement of the real world effectiveness of Tofacitinib for the treatment of Ulcerative Colitis using electronic health records: a protocol and statistical analysis plan v1
使用电子健康记录对托法替尼治疗溃疡性结肠炎的真实世界有效性进行稳健测量:方案和统计分析计划 v1
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Vivek A Rudrapatna;Atul J. Butte
  • 通讯作者:
    Atul J. Butte

Vivek A Rudrapatna的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Shared and Distributed Memory Parallel Pre-Conditioning and Acceleration Algorithms for "Spline- Enhanced" Spatial Discretisations
用于“样条增强”空间离散化的共享和分布式内存并行预处理和加速算法
  • 批准号:
    2907459
  • 财政年份:
    2023
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Studentship
Efficient algorithms and succinct data structures for acceleration of telescoping and related problems
用于加速伸缩及相关问题的高效算法和简洁数据结构
  • 批准号:
    RGPIN-2021-03147
  • 财政年份:
    2022
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Discovery Grants Program - Individual
Acceleration framework for training deep learning by cooperative with algorithms and computer architectures
通过与算法和计算机架构合作训练深度学习的加速框架
  • 批准号:
    21K17768
  • 财政年份:
    2021
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Efficient algorithms and succinct data structures for acceleration of telescoping and related problems
用于加速伸缩及相关问题的高效算法和简洁数据结构
  • 批准号:
    RGPIN-2021-03147
  • 财政年份:
    2021
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Discovery Grants Program - Individual
Material and Device Building Blocks for Hardware Acceleration of Machine Learning and Artificial Intelligence Algorithms
用于机器学习和人工智能算法硬件加速的材料和设备构建模块
  • 批准号:
    2004791
  • 财政年份:
    2020
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Continuing Grant
CIF: Small: Collaborative Research: Acceleration Algorithms for Large-scale Nonconvex Optimization
CIF:小型:协作研究:大规模非凸优化的加速算法
  • 批准号:
    1909291
  • 财政年份:
    2019
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Standard Grant
Acceleration of trigger algorithms with FPGAs at the LHC implemented using higher-level programming languages
使用高级编程语言在 LHC 上使用 FPGA 加速触发算法
  • 批准号:
    ST/S005560/1
  • 财政年份:
    2019
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Training Grant
CIF: Small: Collaborative Research: Acceleration Algorithms for Large-scale Nonconvex Optimization
CIF:小型:协作研究:大规模非凸优化的加速算法
  • 批准号:
    1909298
  • 财政年份:
    2019
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Standard Grant
Acceleration of trigger algorithms with FPGAs at the LHC implemented using higher-level programming languages
使用高级编程语言在 LHC 上使用 FPGA 加速触发算法
  • 批准号:
    2348748
  • 财政年份:
    2019
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Studentship
OAC Core: Small: Enabling High-fidelity Turbulent Reacting-Flow Simulations through Advanced Algorithms, Code Acceleration, and High-order Methods for Extreme-scale Computing
OAC 核心:小型:通过高级算法、代码加速和超大规模计算的高阶方法实现高保真湍流反应流模拟
  • 批准号:
    1909379
  • 财政年份:
    2019
  • 资助金额:
    $ 8.9万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了