权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A data science framework for transforming electronic health records into real-world evidence

将电子健康记录转化为现实世界证据的数据科学框架

基本信息

批准号：
10664706
负责人：
Vivek A Rudrapatna
金额：
$ 8.9万
依托单位：
UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-03 至 2025-07-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10664706
关键词：
3-Dimensional Acceleration Algorithms Bayesian Network Benchmarking Biometry California Chronic Classification Clinic Visits Clinical Data Clinical Research Clinical Trials Complex Data Data Reporting Data Science Data Set Dedications Disease Drug Approval E-learning Effectiveness Elderly Electronic Health Record Eligibility Determination Endoscopy Equity Exclusion Food and Drug Administration Drug Approval Future Goals Healthcare Immune System Diseases Joints Learning Machine Learning Malignant Neoplasms Masks Measurement Measures Mentors Methods Modeling Natural Language Processing Nature New Drug Approvals Patient Representative Patients Pattern Performance Pharmaceutical Preparations Population Predisposition Pregnancy Publishing Race Randomized, Controlled Trials Recording of previous events Research Subjects Sample Size San Francisco Selection for Treatments Semantics Severities Source Structure Subgroup Symptoms Testing Text Time Training Treatment Effectiveness Ulcerative Colitis Uncertainty Universities Work algorithm training career career development cohort computerized tools cost data harmonization data integration electronic structure heterogenous data improved in silico innovation insight learning strategy meetings outcome prediction patient health information patient subsets randomized trial reconstruction support tools tool treatment effect

项目摘要

PROJECT SUMMARY Randomized controlled trials (RCTs) are the gold-standard in clinical research but are subject to many limitations including high costs, limited generalizability, and small sample sizes in patient subgroups. By contrast, electronic health records (EHRs) are widely available and contain information on large and representative patient cohorts. However, because they capture the uncontrolled observations of many clinicians, they are highly susceptible to bias. The recent availability of the raw data from RCTs has created a unique opportunity to integrate them with that from EHRs, and to innovate methods that exploit the distinct advantages of each dataset. We propose to identify the zone of overlap between these data and build bridges in data representations. These bridges could enable us to better emulate randomized trials using EHR data and measure the same effects seen in the trials. Consequently, it would allow us to study subgroups that were excluded from the pivotal trials associated with new drug approvals by the FDA. We will test these ideas out in the context of Ulcerative colitis (UC) and scale to others in future work. We have obtained access to the raw data from 12 RCTs in UC (N=6,226). These data contain timed and structured measurements of disease activity including the Mayo score, a composite score of patient symptoms and endoscopic severity. We have also obtained access to the EHR data of 3,270 UC patients treated at the University of California San Francisco. These data contain similar data as RCTs but largely in an unstructured form. In addition, these assessments tend to be incomplete relative to trials due to costs and invasiveness of some tests. We will address this problem of unharmonized and incomplete EHR data in three aims. In Aim 1, we will harmonize the RCT data into an analysis-ready format. We will also develop text classification tools to transform free-texted EHR data into Mayo subscores, and validate these tools against data from a second center. In Aim 2, we will integrate the RCT and EHR data, train algorithms to impute RCT- based representations of the patient state from partial measurements made in EHRs, and test them under conditions typifying real-world data capture. In Aim 3, we will use these algorithms to harmonize EHR data, validate them as a tool to recover the same effects as RCTs, and study new patient subgroups. The applicant will carry out these aims and train in biostatistics, natural language processing, machine learning, and overall career development. With the help of his mentors, he will launch a career dedicated to developing and disseminating methods for learning from complex clinical data, and in so doing, promote a future of better healthcare for all patients.

项目摘要随机对照试验（RCT）是临床研究的金标准，但受到许多局限性包括高成本、有限的普遍性和患者亚组中的小样本量。通过相比之下，电子健康记录（EHR）是广泛可用的，代表性患者队列。然而，由于它们捕捉了许多不受控制的观察结果，临床医生，他们非常容易受到偏见的影响。最近从随机对照试验中获得的原始数据创造了一个独特的机会，将它们与电子健康档案相结合，并创新方法，利用不同的每个数据集的优点。我们建议确定这些数据之间的重叠区域，并在数据表示中建立桥梁。这些桥梁可以使我们能够更好地模拟使用EHR数据的随机试验，并测量相同的试验中看到的效果。因此，这将使我们能够研究被排除在外的亚组。与FDA新药批准相关的关键试验。我们将在溃疡性结肠炎（UC）的背景下测试这些想法，并在未来的工作中扩展到其他人。我们获得了12项UC RCT的原始数据（N= 6，226）。这些数据包含定时的和结构化的疾病活动性的测量，包括马约评分、患者症状的综合评分和内镜严重度。我们还获得了3，270名UC患者的EHR数据，加州大学旧金山分校弗朗西斯科。这些数据包含与RCT相似的数据，但大部分是非结构化的。 form.此外，这些评估往往是不完整的，相对于试验，由于成本和侵入性，一些测试。我们将在三个目标中解决EHR数据不协调和不完整的问题。在目标1中，我们将把RCT数据统一为分析就绪格式。我们还将开发文本分类工具，将自由文本的EHR数据转换为马约子分数，并根据数据来自第二个中心。在目标2中，我们将整合RCT和EHR数据，训练算法来估算RCT- 根据EHR中的部分测量结果对患者状态进行表征，并在典型的现实世界数据捕获条件。在目标3中，我们将使用这些算法来协调EHR数据，验证它们作为恢复与RCT相同效果的工具，并研究新的患者亚组。申请人将实现这些目标，并在生物统计学，自然语言处理，机器学习和整体职业发展。在导师们的帮助下，他将开始一段致力于开发和传播从复杂的临床数据中学习的方法，并在此过程中促进为所有患者提供更好的医疗保健。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.

使用大型语言模型从临床记录中算法识别治疗中出现的不良事件：炎症性肠病的初步研究。

DOI：
10.1002/cpt.3226
发表时间：
2024
期刊：
Clinical pharmacology and therapeutics
影响因子：
6.7
作者：
Silverman,AnnaL;Sushil,Madhumita;Bhasuran,Balu;Ludwig,Dana;Buchanan,James;Racz,Rebecca;Parakala,Mahalakshmi;El-Kamary,Samer;Ahima,Ohenewaa;Belov,Artur;Choi,Lauren;Billings,Monisha;Li,Yan;Habal,Nadia;Liu,Qi;Tiwari,Jawahar;B
通讯作者：
B

Assessing the Impact of COVID-19 on IBD Outcomes Among Vulnerable Patient Populations in a Large Metropolitan Center.