DMS/NIGMS 2: Statistical Methods and Computational Algorithms for Biobank Data

DMS/NIGMS 2:生物样本库数据的统计方法和计算算法

基本信息

  • 批准号:
    2054253
  • 负责人:
  • 金额:
    $ 95.58万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

Biobank data is characterized by its volume, velocity, variety, and veracity (4V). Two prime examples are the Million Veteran Project (MVP) at US Veterans Affairs (VA) and UK Biobank. The data are big, with up to a million subjects and occupying terabytes of storage (volume). Their sample sizes and data content keep increasing (velocity). They contain heterogeneous sources of information: genome, electronic health record (EHR), wearable devices, images, and most recently, COVID-19 data (variety). Furthermore, they are fraught with missingness and inaccuracy (veracity). This project seeks to develop novel statistical methods and computational algorithms that address specific aspects of 4V. The methods are motivated by the principal investigators' recent experience in analyzing MVP and UK Biobank data, and are generalizable to any biobank or other generic big data. The methods provide solutions to some of the most pressing issues in biobank data analysis. The work will push forward several frontiers in statistics, optimization, and genetics. The research will be integrated with substantial education and outreach activities, including developing new courses and software and mentoring students. These activities aim to expose a diverse set of students, including women and minorities, to state-of-the-art statistical and computational techniques for big data analysis. Three sets of problems are to be investigated. (1) Electronic health records and wearable devices generate a vast amount of longitudinal data in biobanks. In many studies, the within-subject variability of a longitudinal outcome is the primary scientific interest. Motivated by studies of the impacts of blood pressure variability and glycemic variability on diabetes complications, the PIs propose a robust and scalable method for the estimation and inference of the effects of both time-varying and time-invariant predictors on within-subject variance. Compared to existing approaches, the method is robust to the distribution misspecification and orders of magnitude faster. Computational scalability makes it a powerful tool for studying trait variability based on massive longitudinal data in biobanks. (2) The PIs will develop a new class of online learning algorithms, which combine the majorization-minimization principle in statistics and the stochastic proximal iteration algorithm. The new algorithms apply to a broader class of models and are demonstrably more stable and robust. They help solve the volume issue and will be applied to genome-wide association studies of massive biobank data. (3) The PIs propose a bag of little bootstraps (BLB) approach for estimating massive variance component models, which play a central role in genetics and biostatistics. Fitting such models is prohibitive for biobank data because of the inversion of the giant covariance matrix. The BLB approach breaks the massive variance component model into many smaller ones, which are bootstrapped in parallel and then averaged. The new method will enable quantifying heritability and genetic correlation of complex traits in biobank data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
生物银行数据的特点是其体积,速度,种类和准确性(4V)。两个主要的例子是美国退伍军人事务部(VA)和英国生物银行的百万退伍军人项目(MVP)。数据很大,多达一百万个主题,占用TB的存储(体积)。他们的样本量和数据内容不断增加(速度)。它们包含不同的信息来源:基因组、电子健康记录(EHR)、可穿戴设备、图像,以及最近的COVID-19数据(各种)。此外,它们充满了遗漏和不准确(准确性)。该项目旨在开发新的统计方法和计算算法,以解决4V的特定方面。这些方法的动机是主要研究人员最近在分析MVP和英国生物库数据方面的经验,并且可推广到任何生物库或其他通用大数据。这些方法为生物库数据分析中一些最紧迫的问题提供了解决方案。这项工作将推动统计学,优化和遗传学的几个前沿领域。这项研究将与大量的教育和推广活动相结合,包括开发新的课程和软件以及指导学生。这些活动旨在使包括妇女和少数民族在内的各种学生接触最先进的统计和计算技术,以进行大数据分析。要研究三组问题。(1)电子健康记录和可穿戴设备在生物库中产生了大量的纵向数据。在许多研究中,纵向结果的受试者内变异性是主要的科学兴趣。受血压变异性和血糖变异性对糖尿病并发症的影响的研究的启发,PI提出了一种稳健且可扩展的方法,用于估计和推断时变和时不变预测因子对受试者内方差的影响。与现有方法相比,该方法对分布误设具有较强的鲁棒性,且速度快了几个数量级。计算可扩展性使其成为基于生物库中海量纵向数据研究性状变异性的有力工具。(2)PI将开发一类新的在线学习算法,它联合收割机了统计学中的优化-最小化原理和随机邻近迭代算法。新算法适用于更广泛的模型类别,并且证明更加稳定和强大。它们有助于解决数量问题,并将应用于海量生物库数据的全基因组关联研究。(3)PI提出了一种用于估计大规模方差分量模型的BLB方法,该方法在遗传学和生物统计学中起着核心作用。由于巨大的协方差矩阵的反演,拟合这样的模型对于生物库数据是禁止的。BLB方法将大方差分量模型分解为许多较小的模型,这些模型并行自举,然后平均。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(35)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risk Data: With Applications to Massive Biobank Data.
有效的算法和实施纵向和竞争风险数据的半参数联合模型:与大规模生物库数据的应用。
Risk controlled decision trees and random forests for precision Medicine.
  • DOI:
    10.1002/sim.9253
  • 发表时间:
    2022-02-20
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Doubleday K;Zhou J;Zhou H;Fu H
  • 通讯作者:
    Fu H
VCSEL: PRIORITIZING SNP-SET BY PENALIZED VARIANCE COMPONENT SELECTION.
  • DOI:
    10.1214/21-aoas1491
  • 发表时间:
    2021-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kim J;Shen J;Wang A;Mehrotra DV;Ko S;Zhou JJ;Zhou H
  • 通讯作者:
    Zhou H
ORTHOGONAL TRACE-SUM MAXIMIZATION: TIGHTNESS OF THE SEMIDEFINITE RELAXATION AND GUARANTEE OF LOCALLY OPTIMAL SOLUTIONS.
正交迹和最大化:半定松弛的严格性和局部最优解的保证。
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Hua Zhou其他文献

Exogenous infusion of short-chain fatty acids can improve intestinal functions independently of the gut microbiota
外源性输注短链脂肪酸可以独立于肠道微生物群改善肠道功能
  • DOI:
    10.1093/jas/skaa371
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    3.3
  • 作者:
    Hua Zhou;Jing Sun;Liangpeng Ge;Zuohua Liu;Hong Chen;Bing Yu;Daiwen Chen
  • 通讯作者:
    Daiwen Chen
Novel Water Harvesting Fibrous Membranes with Directional Water Transport Capability
具有定向输水能力的新型集水纤维膜
  • DOI:
    10.1002/admi.201801529
  • 发表时间:
    2019-01
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    Jing Wu;Hua Zhou;Hongxia Wang;Tong Lin;et al.
  • 通讯作者:
    et al.
Macrophage Inhibitor, Semapimod, Reduces Tumor Necrosis Factor-Alpha in Myocardium in a Rat Model of Ischemic Heart Failure
巨噬细胞抑制剂 Semapimod 可减少缺血性心力衰竭大鼠模型心肌中的肿瘤坏死因子-α
  • DOI:
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    3
  • 作者:
    A. Kherani;Garrett W Moss;Hua Zhou;A. Gu;Ge Zhang;Allison R. Schulman;Jennifer M. Fal;Robert Sorabella;T. Plasse;Liu Rui;S. Homma;D. Burkhoff;M. Oz;Jie Wang
  • 通讯作者:
    Jie Wang
Cognitive Factors of Weight Management During Pregnancy Among Chinese Women: A Study Applying Protective Motivation Theory
中国女性孕期体重管理的认知因素:应用保护动机理论的研究
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Xueqing Peng;Nichao Yang;Chi Zhang;A. N. Walker;Yingying Shen;Hua Jiang;Sen Li;H. You;Hua Zhou;Li Wang
  • 通讯作者:
    Li Wang
Computer-based algorithm modeling protein metabolism in aortic regurgitation for positron emission tomography
基于计算机的正电子发射断层扫描算法对主动脉瓣反流中的蛋白质代谢进行建模
  • DOI:
  • 发表时间:
    1994
  • 期刊:
  • 影响因子:
    0
  • 作者:
    E. Herrold;S. M. Goldfine;Hua Zhou;A. Cooper;S. Nakayama;P. Zanzonico;N. Magid;J. Borer
  • 通讯作者:
    J. Borer

Hua Zhou的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Hua Zhou', 18)}}的其他基金

SCH: Statistical Foundation and Predictive Modeling for Personalized Diabetes Management: Continuous Glucose Monitoring (CGM), Electronic Health Records (EHR), and Biobanks
SCH:个性化糖尿病管理的统计基础和预测模型:连续血糖监测 (CGM)、电子健康记录 (EHR) 和生物样本库
  • 批准号:
    2205441
  • 财政年份:
    2022
  • 资助金额:
    $ 95.58万
  • 项目类别:
    Standard Grant
Tensor Regressions and Applications in Neuroimaging Data Analysis
张量回归及其在神经影像数据分析中的应用
  • 批准号:
    1645093
  • 财政年份:
    2015
  • 资助金额:
    $ 95.58万
  • 项目类别:
    Continuing Grant
Tensor Regressions and Applications in Neuroimaging Data Analysis
张量回归及其在神经影像数据分析中的应用
  • 批准号:
    1310319
  • 财政年份:
    2013
  • 资助金额:
    $ 95.58万
  • 项目类别:
    Continuing Grant

相似海外基金

Collaborative Research: DMS/NIGMS 2: New statistical methods, theory, and software for microbiome data
合作研究:DMS/NIGMS 2:微生物组数据的新统计方法、理论和软件
  • 批准号:
    10797410
  • 财政年份:
    2023
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 1: Statistical Methods for Design and Analysis of Clinical-scale Single Cell Studies
DMS/NIGMS 1:临床规模单细胞研究设计和分析的统计方法
  • 批准号:
    2245575
  • 财政年份:
    2023
  • 资助金额:
    $ 95.58万
  • 项目类别:
    Standard Grant
DMS/NIGMS 2: Advanced Statistical Methods for Spatially Resolved Transcriptomics Studies
DMS/NIGMS 2:空间分辨转录组学研究的高级统计方法
  • 批准号:
    10493427
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 1: Statistical modeling and estimation of cellular population dynamics
DMS/NIGMS 1:细胞群体动态的统计建模和估计
  • 批准号:
    10698147
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 1: Statistical modeling and estimation of cellular population dynamics
DMS/NIGMS 1:细胞群体动态的统计建模和估计
  • 批准号:
    10378318
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 2: Statistical Network Models for Protein Aggregation
DMS/NIGMS 2:蛋白质聚集的统计网络模型
  • 批准号:
    10673898
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 2: Collaborative Research: Developing Statistical Learning Methods for Revealing the Molecular Signatures of Microvascular Changes in Neural Injury
DMS/NIGMS 2:合作研究:开发统计学习方法来揭示神经损伤中微血管变化的分子特征
  • 批准号:
    2053832
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
    Continuing Grant
DMS/NIGMS 2: Advanced Statistical Methods for Spatially Resolved Transcriptomics Studies
DMS/NIGMS 2:空间分辨转录组学研究的高级统计方法
  • 批准号:
    10708800
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 2: Statistical Network Models for Protein Aggregation
DMS/NIGMS 2:蛋白质聚集的统计网络模型
  • 批准号:
    10493283
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
DMS/NIGMS 2: Advanced Statistical Methods for Spatially Resolved Transcriptomics Studies
DMS/NIGMS 2:空间分辨转录组学研究的高级统计方法
  • 批准号:
    10797593
  • 财政年份:
    2021
  • 资助金额:
    $ 95.58万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了