Leveraging Heterogeneous Data Across International Borders in a Privacy Preserving Manner for Clinical Deep Learning
以隐私保护的方式利用跨国界的异构数据进行临床深度学习
基本信息
- 批准号:1822378
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-03-15 至 2021-02-28
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
There is a growing awareness of the need for multi-center clinical databases and multi-institutional analyses of healthcare data to ensure reproducibility and generalizability of research findings. Single-instance database algorithms are prone to three distinct problems. First, in the context of Big Data science, the size of the data compared to the number of variables makes it difficult to develop complex predictors without overfitting, and more traditional learning algorithms may lead to over-simplified models that do not capture important related influences or interactions between different types of healthcare information. Second, training and testing predictive models on a single database can lead to learning noise or other irrelevant local practices or differences in definitions that are correlated with, but not causally related to, the outcome in question. This leads to models that do not work in other institutions or in the future when practices or the environment changes. Third, sharing data between institutions, and in particular, across borders, is extremely problematic because of trust, legal issues, privacy issues and national policies. The significance of solving these issues is threefold: 1) it would allow the creating of strong generalizable data science models, which leverage enormous pools of data from around the world; 2) it would also allow the identification of rare diseases or patient types, which, as we compile databases, become less rare; and 3) perhaps most importantly, it would allow the free exchange of data science models and generalized approaches to solving medical problems in the cloud.This project aims to develop a set of distributed deep learning and cloud computation techniques for cross-institution and cross-border machine learning on health and medical data without the need for protected health information to leave the generating institution. The goals are to create demonstration programs which illustrate feasibility and open source the architecture. The scope of this project encompasses the broad set of machine learning-based tasks multiple institutions may want to apply to their healthcare data in the cloud, as well as the technical issues surrounding transfer learning of knowledge across domains (e.g., institutions/demographics) and tasks (e.g., types of classification and prediction problems). The project has three specific aims: 1) develop a cloud-based infrastructure which preserves regional autonomy of data, but allows the sharing of parameters of the partially trained deep neural network (including weights and hyperparameters) between regions, to allow transfer learning across domains and tasks; 2) develop a standardized coded model for deep learning approaches in medical applications; and 3) evaluate the effect of training and testing the model across multiple centers and national boundaries, by comparing improvement in performance with cross-institutional training without loss of privacy protection, using metrics of sensitivity, specificity, positive predictive value, area under the receiver operating characteristic (ROC) curve and model calibration. Aims 1-3 will be achieved by taking four databases (including, a database of intensive care unit patients with sepsis, a free text corpus of nursing progress notes, voice recordings taken from a public corpus classically used for speaker identification, and a public database of full-face images used for classification of facial expressions) and placing them in the cloud (Google, AWS and Azure) at different geopolitical locations (namely US and Europe) and developing a distributed deep learning architecture that learns to improve its performance by sharing weights across borders, but not sensitive patient data. This project has the potential to make several contributions to the field. First, it will demonstrate that medical data across geopolitical boundaries can be made available in an interoperable manner (using the FHIR standard) and can be used for training of deep learning algorithms in a privacy-preserving manner, thus addressing both the concerns of Health Insurance, Portability and Privacy Act (HIPPA) and interoperability. Secondly, it will provide open-source deep learning algorithms for several medical datasets and data types that can be used across institutions to solve similar problems with some fine-tuning (e.g., via transfer learning). Third, it will provide a set of open-source meta algorithms for transfer learning (across domains and tasks) implemented on the cloud in containers (dockers) that can be downloaded for local use or transferred across the different cloud vendors.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人们越来越意识到需要多中心临床数据库和医疗数据的多机构分析,以确保研究结果的可重复性和普遍性。单实例数据库算法容易出现三个不同的问题。首先,在大数据科学的背景下,与变量数量相比,数据的大小使得很难在不过度拟合的情况下开发复杂的预测器,并且更传统的学习算法可能导致过度简化的模型,这些模型无法捕获重要的相关影响或不同类型的医疗保健信息之间的相互作用。其次,在单个数据库上训练和测试预测模型可能会导致学习噪音或其他不相关的本地实践或定义差异,这些差异与所讨论的结果相关,但没有因果关系。这导致了在其他机构或在未来实践或环境发生变化时无法工作的模型。第三,由于信任、法律的问题、隐私问题和国家政策,机构之间的数据共享,特别是跨境共享,是极其成问题的。解决这些问题的意义有三个方面:1)它将允许创建强大的可推广的数据科学模型,这些模型利用了来自世界各地的大量数据; 2)它还将允许识别罕见疾病或患者类型,随着我们编译数据库,这些疾病或患者类型变得不那么罕见;以及3)也许最重要的是,它将允许在云中自由交换数据科学模型和解决医疗问题的通用方法。该项目旨在开发一套分布式深度学习和云计算跨机构和跨境机器学习技术,对健康和医疗数据进行学习,而不需要将受保护的健康信息从生成机构中删除。我们的目标是创建演示程序,说明可行性和开放源代码的架构。该项目的范围包括多个机构可能希望应用于云中医疗保健数据的广泛的基于机器学习的任务,以及围绕跨领域知识转移学习的技术问题(例如,机构/人口统计)和任务(例如,分类和预测问题的类型)。该项目有三个具体目标:1)开发基于云的基础设施,保留数据的区域自治,但允许共享部分训练的深度神经网络的参数(包括权重和超参数),以允许跨领域和任务的迁移学习; 2)为医学应用中的深度学习方法开发标准化编码模型;以及3)通过比较在不损失隐私保护的情况下与跨机构训练的性能改善,使用灵敏度、特异性、阳性预测值、受试者工作特征(ROC)曲线下面积和模型校准的度量,评估跨多个中心和国家边界训练和测试模型的效果。目标1-3将通过采用四个数据库来实现(包括,患有败血症的重症监护病房患者的数据库、护理进展记录的自由文本语料库、从通常用于说话者识别的公共语料库获取的语音记录、以及用于面部表情分类的全脸图像的公共数据库)并将它们放置在云中(谷歌,AWS和Azure)在不同的地缘政治位置(即美国和欧洲)和开发一个分布式深度学习架构,学习通过跨国界共享权重来提高其性能,但不包括敏感的患者数据。该项目有可能对该领域做出一些贡献。首先,它将证明跨地缘政治边界的医疗数据可以以可互操作的方式(使用FHIR标准)提供,并可以以隐私保护的方式用于深度学习算法的训练,从而解决健康保险,便携性和隐私法案(HIPPA)和互操作性的问题。其次,它将为几个医疗数据集和数据类型提供开源深度学习算法,这些算法可以跨机构使用,通过一些微调(例如,通过迁移学习)。第三,它将提供一套用于迁移学习(跨域和任务)的开源Meta算法,这些算法可以在容器(dockers)中的云上实现,可以下载供本地使用或在不同的云供应商之间传输。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响力审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019
- DOI:10.1097/ccm.0000000000004145
- 发表时间:2020-02-01
- 期刊:
- 影响因子:8.8
- 作者:Reyna, Matthew A.;Josef, Christopher S.;Sharma, Ashish
- 通讯作者:Sharma, Ashish
A Deep Learning Architecture for Psychometric Natural Language Processing
- DOI:10.1145/3365211
- 发表时间:2020-02
- 期刊:
- 影响因子:0
- 作者:Faizan Ahmad;A. Abbasi;Jingjing Li;David G. Dobolyi;Richard G. Netemeyer;G. Clifford;Hsinchun Chen
- 通讯作者:Faizan Ahmad;A. Abbasi;Jingjing Li;David G. Dobolyi;Richard G. Netemeyer;G. Clifford;Hsinchun Chen
DeepAISE on FHIR — An Interoperable Real-Time Predictive Analytic Platform for Early Prediction of Sepsis
FHIR 上的 DeepAISE — 用于脓毒症早期预测的可互操作实时预测分析平台
- DOI:
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Lakshman, Vidyashankar;Amrollahi, Fatemeh;Koppisetty, Veera Supraja;Shashikumar, Supreeth P.;Sharma, Ashish;Nemati, Shamim
- 通讯作者:Nemati, Shamim
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Gari Clifford其他文献
ECG QT interval estimation with a transfer deep learning model
- DOI:
10.1016/j.jelectrocard.2023.03.053 - 发表时间:
2023-05-01 - 期刊:
- 影响因子:
- 作者:
Joel Xue;Aarya Parekh;Miguel Kirsch;Reena Yuan;Daniel Treiman;David Albert;Gari Clifford - 通讯作者:
Gari Clifford
P123. Anxiety Sensitivity is a Leading Risk Factor of Severe or Widespread Pain Three Months After Motor Vehicle Collision
- DOI:
10.1016/j.biopsych.2022.02.357 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Kyle Polanco;Qinghua Li;Xinming An;Francesca Beaudoin;Donglin Zeng;Jennifer Stevens;Sarah Linnstaedt;Tanja Jovanovic;Thomas Neylan;Gari Clifford;Kerry Ressler;Karestan Koenen;Ronald Kessler;Samuel A. McLean - 通讯作者:
Samuel A. McLean
P639. “Ask Your Heart What It Doth Know”: 100+ Heart Rate Variability-Based Biomarkers of Mental and Physical Health Identified in a Large Cohort of Trauma Survivors
- DOI:
10.1016/j.biopsych.2022.02.876 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Lindsay Macchio;Lauriane Guichard;Yinyao Ji;Xinming An;Thomas Neylan;Gari Clifford;Qiao Li;Jennifer Stevens;Tanja Jovanovic;Sarah Linnstaedt;Kerry Ressler;Karestan Koenen;Ronald Kessler;Samuel McLean for the AURORA Study Group - 通讯作者:
Samuel McLean for the AURORA Study Group
PO-703-04 ECG-AI CAN PREDICT RISK FOR HEART FAILURE WITH BOTH PRESERVED AND REDUCED EJECTION FRACTION
- DOI:
10.1016/j.hrthm.2022.03.1058 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:5.700
- 作者:
Ibrahim Karabayir;Liam Butler;Dalane Kitzman;Alvaro Alonso;Geoff Tison;Lin Yee Chen;Gari Clifford;Elsayed Z. Soliman;Oguz Akbilgic - 通讯作者:
Oguz Akbilgic
371. Objectively-Characterized Peritraumatic Sleep Phenotypes Are Associated With Both Pre-Trauma Characteristics and Peritraumatic Symptom Outcomes
- DOI:
10.1016/j.biopsych.2024.02.870 - 发表时间:
2024-05-15 - 期刊:
- 影响因子:
- 作者:
Oliver Holmes;Meredith Bucher;Thomas Neylan;Gari Clifford;Qiao Li;Qinghua Li;Robert Dougherty;Justin Baker;Sarah Linnstaedt;Tanja Jovanovic;Jennifer Stevens;Stacey House;Kerry Ressler;Ronald Kessler;Samuel McLean;Xinming An - 通讯作者:
Xinming An
Gari Clifford的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Gari Clifford', 18)}}的其他基金
BD Spokes: SPOKE: SOUTH: Large-Scale Medical Informatics for Patient Care Coordination and Engagement
BD Spokes:SPOKE:SOUTH:用于患者护理协调和参与的大规模医疗信息学
- 批准号:
1636933 - 财政年份:2016
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Multi-scale markers of circadian rhythm changes for monitoring of mental health
用于监测心理健康的昼夜节律变化的多尺度标记
- 批准号:
EP/K020161/1 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Research Grant
相似海外基金
CRII: CSR: Adaptive Federated Continuous Learning on Heterogeneous Edge Devices with Unlabeled Data
CRII:CSR:具有未标记数据的异构边缘设备的自适应联合连续学习
- 批准号:
2348279 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: A Platform for Per-Packet AI using Heterogeneous Data Planes
职业:使用异构数据平面的每数据包人工智能平台
- 批准号:
2338034 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Deep Learning for 3-D reconstruction of heterogeneous molecular structures from Cryo-EM data
利用冷冻电镜数据进行异质分子结构 3D 重建的深度学习
- 批准号:
BB/Y513878/1 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Research Grant
Structural Identification and Condition Assessment of Prestressed Concrete Bridges in Marine Environment Using Heterogeneous Test Data
利用异质试验数据进行海洋环境中预应力混凝土桥梁的结构识别和状态评估
- 批准号:
24K17344 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
GOALI: Frameworks: At-Scale Heterogeneous Data based Adaptive Development Platform for Machine-Learning Models for Material and Chemical Discovery
GOALI:框架:基于大规模异构数据的自适应开发平台,用于材料和化学发现的机器学习模型
- 批准号:
2311632 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Towards Harmonious Federated Intelligence in Heterogeneous Edge Computing via Data Migration
协作研究:SHF:中:通过数据迁移实现异构边缘计算中的和谐联邦智能
- 批准号:
2312617 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
FuSe-TG: FAB: A Heterogeneous Ferroelectronics Platform for Accelerating Big Data Analytics
FuSe-TG:FAB:加速大数据分析的异构铁电子平台
- 批准号:
2235366 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Data-Enabled Predictive Control for Heterogeneous Mixed Traffic Systems
协作研究:异构混合流量系统的可扩展数据支持预测控制
- 批准号:
2320697 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: Knowledge discovery from highly heterogeneous, sparse and private data in biomedical informatics
合作研究:III:中:生物医学信息学中高度异构、稀疏和私有数据的知识发现
- 批准号:
2312862 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: Harnessing Heterogeneous Sources of Data and Artificial Intelligence for Informed Flood Management
职业:利用异构数据源和人工智能进行明智的洪水管理
- 批准号:
2238639 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant