Stochastic Deep Learning for Electronic Health Records: Localizing Learning with Massive and Fragmented Data
电子健康记录的随机深度学习:利用海量碎片数据进行本地化学习
基本信息
- 批准号:10793778
- 负责人:
- 金额:$ 20万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-25 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:AddressComputersDataData AnalysesData ScienceDimensionsElectronic Health RecordFoundationsHealthImageLassoLawsLearningLinear ModelsMarkov chain Monte Carlo methodologyMedicalMethodsPatient CarePatientsPublic HealthSample SizeScientistSolidStructureSystemTechnologyTrainingUncertaintydeep learningdeep neural networkflexibilityheterogenous dataimprovedinnovationlearning strategymultidimensional dataneural networktheoriestool
项目摘要
Coming with the new century, integration of computer technology into medical practice has enabled
scientists to collect massive volumes of electronic health records (EHR) and, in the meantime, deep
learning has been developed as the major tool of massive data analysis. However, the EHR data are
heterogeneous [varied much for different groups of patients] and fragmented [consisting of a high
proportion of missing values], which poses a significant barrier to the applicability and generalizability of
current deep neural networks. This project aims to build a health prediction system based on a new type
of stochastic neural network (StoNet) with massive, heterogeneous, and fragmented data, while
considering integration of the omics, imaging and EHR data in training the system. The StoNet is
formulated as a composition of many simple regressions; it is asymptotically equivalent to the deep neural
network (DNN) in function approximation as the training sample size becomes large, but its structure is
more flexible for dealing with the complexity of EHR data. The StoNet is trained by an adaptive stochastic
gradient Markov chain Monte Carlo (MCMC) algorithm. By leveraging on the flexible structure of the
StoNet and the sophisticated adaptive stochastic gradient MCMC algorithm, this project provides a
rigorous statistical framework for deep learning with massive, heterogeneous and fragmented EHR data.
We show that the StoNet forms a bridge from linear models to deep learning, enabling many of the theory
and methods developed for linear models to be transferred to deep learning. In particular, we show the
sparse learning theory developed for linear models with the Lasso penalty can be transferred to the
StonNet, leading to an innovative consistent sparse deep learning method; we address the data
heterogeneity issue by replacing each regression of the first hidden layer of the StoNet by a mixture
regression; and we address the missing data issue by training the StoNet with an adaptive stochastic
gradient MCMC algorithm where the missing data are imputed as for a linear model with multiple
imputation methods. The Markovian structure of the StoNet enables the network parameters to be locally
learned with fragmented data and leads to an innovative way for nonlinear sufficient dimension reduction
of high-dimensional data, facilitating integration of different types of data in StoNet training. We also show
the prediction uncertainty of the StoNet can be easily quantified with a recursive application of Eve's law.
随着新世纪的到来,计算机技术与医学实践的结合使
科学家们将收集大量的电子健康记录(EHR),同时,深入
学习已经发展成为海量数据分析的主要工具。然而,EHR的数据是
异质性[对不同的患者群体有很大不同]和支离破碎的[由高
缺失值比例],这对《公约》的适用性和普遍性构成了重大障碍
当前的深度神经网络。该项目旨在构建一种基于新型健康预测系统
具有海量、异质和碎片化数据的随机神经网络(StoNet),而
在训练系统时考虑整合组学、影像和EHR数据。StoNet是
由许多简单回归组成的公式;它渐近等同于深层神经
当训练样本量变大时,网络(DNN)在函数逼近中,但其结构是
更加灵活地处理复杂的电子病历数据。该网络由一种自适应随机网络来训练
梯度马尔可夫链蒙特卡罗(MCMC)算法。通过利用灵活的
StoNet和成熟的自适应随机梯度MCMC算法,本项目提供了一种
针对海量、异质和零散的电子病历数据的深度学习的严格统计框架。
我们证明了StoNet形成了从线性模型到深度学习的桥梁,使许多理论成为可能
以及为将线性模型转移到深度学习而开发的方法。特别是,我们向您展示了
为带有套索惩罚的线性模型开发的稀疏学习理论可以转移到
StonNet,导致了一种创新的一致稀疏深度学习方法;我们解决了数据
用混合体替换StoNet的第一个隐层的每个回归的异质性问题
回归;我们通过用自适应随机训练StoNet来解决丢失数据的问题
梯度MCMC算法,其中缺失数据被归结为具有多个
归责方法。StoNet的马尔可夫结构使网络参数能够在本地
利用碎片化数据学习,并导致了一种创新的非线性充分降维方法
高维数据,促进不同类型数据在StoNet培训中的整合。我们还展示了
通过递归应用Eve定律,可以很容易地量化StoNet的预测不确定性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
FAMING LIANG其他文献
FAMING LIANG的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('FAMING LIANG', 18)}}的其他基金
An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis
生物医学复杂数据分析的插补一致性算法
- 批准号:
9658022 - 财政年份:2018
- 资助金额:
$ 20万 - 项目类别:
Equivalent Partial Correlation Methods for Integrative Genetic Network Analysis
综合遗传网络分析的等效偏相关方法
- 批准号:
9133431 - 财政年份:2015
- 资助金额:
$ 20万 - 项目类别:
Equivalent Partial Correlation Methods for Integrative Genetic Network Analysis
综合遗传网络分析的等效偏相关方法
- 批准号:
9696111 - 财政年份:2015
- 资助金额:
$ 20万 - 项目类别:
Equivalent Partial Correlation Methods for Integrative Genetic Network Analysis
综合遗传网络分析的等效偏相关方法
- 批准号:
9273537 - 财政年份:2015
- 资助金额:
$ 20万 - 项目类别:
相似海外基金
CAREER: Architecting a Hardware-Software Co-Designed Data Management System for Heterogeneous Memory Computers
职业:为异构内存计算机构建软硬件协同设计的数据管理系统
- 批准号:
2144883 - 财政年份:2022
- 资助金额:
$ 20万 - 项目类别:
Continuing Grant
Geometric Learning to Make Computers Better Understand Structured Data
几何学习使计算机更好地理解结构化数据
- 批准号:
19K20353 - 财政年份:2019
- 资助金额:
$ 20万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Clinicians, Computers, and a Common Language: Building Artificial Intelligence which Collaborates with Human Knowledge on Complex Problems in Medical Image Analysis and Surgical Data Science
临床医生、计算机和通用语言:构建人工智能,与人类知识协作解决医学图像分析和手术数据科学中的复杂问题
- 批准号:
516382-2018 - 财政年份:2019
- 资助金额:
$ 20万 - 项目类别:
Postdoctoral Fellowships
Clinicians, Computers, and a Common Language: Building Artificial Intelligence which Collaborates with Human Knowledge on Complex Problems in Medical Image Analysis and Surgical Data Science
临床医生、计算机和通用语言:构建人工智能,与人类知识协作解决医学图像分析和手术数据科学中的复杂问题
- 批准号:
516382-2018 - 财政年份:2018
- 资助金额:
$ 20万 - 项目类别:
Postdoctoral Fellowships
SHF: Small: Hyperscaling Data Analytics for High-Performance Computers
SHF:小型:高性能计算机的超大规模数据分析
- 批准号:
1816577 - 财政年份:2018
- 资助金额:
$ 20万 - 项目类别:
Standard Grant
Clinicians, Computers, and a Common Language: Building Artificial Intelligence which Collaborates with Human Knowledge on Complex Problems in Medical Image Analysis and Surgical Data Science
临床医生、计算机和通用语言:构建人工智能,与人类知识协作解决医学图像分析和手术数据科学中的复杂问题
- 批准号:
516382-2018 - 财政年份:2017
- 资助金额:
$ 20万 - 项目类别:
Postdoctoral Fellowships
Clinical Trials Data Collection Using Palm Computers
使用掌上电脑收集临床试验数据
- 批准号:
6444163 - 财政年份:2002
- 资助金额:
$ 20万 - 项目类别:
Capturing Data in the Field: An Application Framework for Easily Creating Custom Data and Metadata Entry Forms on Handheld and Desktop Computers
现场捕获数据:用于在手持式和台式计算机上轻松创建自定义数据和元数据输入表单的应用程序框架
- 批准号:
0131178 - 财政年份:2002
- 资助金额:
$ 20万 - 项目类别:
Continuing Grant
Data Computation and Software Development in Number Theory by Computers
计算机数论数据计算和软件开发
- 批准号:
10440012 - 财政年份:1998
- 资助金额:
$ 20万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Computers for analysis of SNO data
用于分析 SNO 数据的计算机
- 批准号:
195975-1997 - 财政年份:1998
- 资助金额:
$ 20万 - 项目类别:
Subatomic Physics Envelope - Research Tools and Instruments