权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SCH: INT: Collaborative Research: High-throughput Phenotyping on Electronic Health Records using Multi-Tensor Factorization

SCH：INT：协作研究：使用多张量分解对电子健康记录进行高通量表型分析

基本信息

批准号：
1417697
负责人：
Joydeep Ghosh
金额：
$ 66.36万
依托单位：
University of Texas at Austin
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2019-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1417697&HistoricalAwards=false
关键词：
SCH INT Collaborative Research throughput

项目摘要

As the adoption of electronic health records (EHRs) has grown, EHRs are now composed of a diverse array of data, including structured information (e.g., diagnoses, medications, and lab results), molecular sequences, unstructured clinical progress notes, and social network information. There is mounting evidence that EHRs are a rich resource for clinical research, but they are notoriously difficult to leverage because of their orientation to healthcare business operations, heterogeneity across commercial systems, and high levels of missing or erroneous entries. Moreover, the interactions among different data sources within an EHR are challenging to model, hampering our ability to leverage traditional analytic frameworks. In recognition of this problem, various efforts have been undertaken to transform EHR data into concise and meaningful concepts, or phenotypes. Yet, to date, these efforts have been ad hoc and labor intensive, resulting in specific phenotypes for specific environments; e.g., type 2 diabetes in the EHR system at Vanderbilt University Medical Center (VUMC). There is an urgent need for scalable phenotyping methods, but several major challenges must be addressed, including: a) patient representation, b) high-throughput phenotype generation from EHRs, c) expert-guided phenotype refinement, and d) phenotype adaptation across institutions. The goal of this project is to address these challenges by developing a general computational framework for transforming EHR data into meaningful phenotypes with only modest levels of expert guidance. The PIs will develop novel courses on Healthcare Analytics as a Massive Open Online Course (MOOC) that covers cross-disciplinary topics at the confluence of computer science and medical informatics, while embellishing existing graduate courses on biomedical informatics. The PIs plan to deliver tutorials and organize workshops at relevant computer science and medical informatics conferences with the goal of sharing research results and developing a community. The PIs will develop outreach modules that focus on freshmen and under-represented students, as well as educational sessions for clinical researchers who are currently performing phenotyping in academic medical centers. Thus, the project has a significant component the integrates research and education as well as providing for new scientific insights.In support of this goal, the team plans to represent and analyze EHR data as inter-connected high-order relations i.e. tensors (e.g. tuples of patient-medication-diagnosis, patient-lab, and patient-symptoms). The proposed analytic framework generalizes several existing data mining methodologies, including dimensionality reduction, topic modeling and co-clustering, which all arise as limited special cases of analyzing second order tensors. It will also enable flexible refinement of candidates to adapt phenotypes from one healthcare institution to another, and will incorporate feedback from domain experts. The accompanying suite of algorithms and methods will enable the automation of high-throughput phenotype generation, refinement, adaptation and applications, in a broad range of health informatics settings and across multiple institutions. This project will integrate biomedical informaticists, computer scientists, and clinical experts. The significance of the resulting phenotypes in diverse clinical applications, including: a) cohort construction, where case and control patients are identified with respect to specific phenotype combinations; b) genome wide association studies (GWAS), where target phenotypes of patients are tested against DNA sequence variation for significant statistical associations; and c) clinical predictive modeling, where a model is developed to predict target phenotypes or diseases will be demonstrated. The framework will be developed with public accessible data from MIMIC-II and CMS and validate in real clinical environments at Northwestern Memorial Hospital and VUMC through several high-impact disease targets (including hypertension, type 2 diabetes, hypothyroidism, atrial fibrillation, rheumatoid arthritis, and multiple sclerosis). Additionally, the methodologies developed through this project will be integrated into existing software platforms that support the representation of EHR-derived phenotypes, but lack a data-driven component for the generation and refinement of candidates. Overall, the proposed framework is expected to have a major impact on translational clinical research including clinical trial design, predictive modeling, epidemiology studies and clinical decision support.

随着电子健康记录（EHR）的采用的增长，EHR现在由各种各样的数据组成，包括结构化信息（例如，诊断、药物和实验室结果）、分子序列、非结构化临床进展记录和社交网络信息。越来越多的证据表明，EHR是临床研究的丰富资源，但由于其面向医疗保健业务运营，商业系统之间的异质性以及高水平的缺失或错误条目，因此它们非常难以利用。此外，EHR中不同数据源之间的交互对建模具有挑战性，阻碍了我们利用传统分析框架的能力。认识到这个问题，已经进行了各种努力，将EHR数据转换为简洁和有意义的概念或表型。然而，到目前为止，这些努力是临时的和劳动密集型的，导致特定环境的特定表型;例如，范德比尔特大学医学中心（VUMC）的EHR系统中的2型糖尿病。迫切需要可扩展的表型分型方法，但必须解决几个主要挑战，包括：a）患者代表，B）EHR的高通量表型生成，c）专家指导的表型细化，以及d）跨机构的表型适应。该项目的目标是通过开发一个通用的计算框架来解决这些挑战，该框架用于将EHR数据转换为有意义的表型，只有适度的专家指导。 PI将开发关于医疗保健分析的新课程，作为大规模开放式在线课程（MOOC），涵盖计算机科学和医学信息学融合的跨学科主题，同时美化现有的生物医学信息学研究生课程。PI计划在相关的计算机科学和医学信息学会议上提供教程和组织研讨会，目的是分享研究成果和发展社区。PI将开发外展模块，重点关注新生和代表性不足的学生，以及目前在学术医疗中心进行表型分析的临床研究人员的教育课程。因此，该项目的一个重要组成部分是整合研究和教育，并提供新的科学见解。为了支持这一目标，该团队计划将EHR数据表示和分析为相互连接的高阶关系，即张量（例如，患者-药物-诊断，患者-实验室和患者-症状的元组）。建议的分析框架概括了现有的几种数据挖掘方法，包括降维，主题建模和共聚类，这些都是分析二阶张量的有限特例。它还将使候选人的灵活改进，以适应从一个医疗机构到另一个医疗机构的表型，并将纳入领域专家的反馈。伴随的一套算法和方法将使高通量表型生成，改进，适应和应用的自动化，在广泛的健康信息学设置和跨多个机构。该项目将整合生物医学信息学家，计算机科学家和临床专家。所得表型在不同临床应用中的意义，包括：a）群组构建，其中关于特定表型组合鉴定病例和对照患者; B）全基因组关联研究（GWAS），其中针对DNA序列变异测试患者的靶表型以获得显著的统计关联;以及c）临床预测建模，其中将展示开发模型以预测目标表型或疾病。该框架将使用MIMIC-II和CMS的公共可访问数据进行开发，并通过几种高影响疾病目标（包括高血压、2型糖尿病、甲状腺功能减退症、房颤、类风湿性关节炎和多发性硬化症）在西北纪念医院和VUMC的真实的临床环境中进行验证。此外，通过该项目开发的方法将被集成到现有的软件平台中，这些平台支持EHR衍生表型的表示，但缺乏用于生成和改进候选人的数据驱动组件。总体而言，拟议的框架预计将对转化临床研究产生重大影响，包括临床试验设计，预测建模，流行病学研究和临床决策支持。