Collaborative Research: Use of Random Compression Matrices For Scalable Inference in High Dimensional Structured Regressions

合作研究:使用随机压缩矩阵进行高维结构化回归中的可扩展推理

基本信息

  • 批准号:
    2210672
  • 负责人:
  • 金额:
    $ 18万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-15 至 2025-05-31
  • 项目状态:
    未结题

项目摘要

As the scientific community moves into a data-driven era, there is an unprecedented opportunity to leverage large scale imaging, genetic and EHR data to better characterize and understand human disease to improve treatment and prognosis. Consequently, analysis of such datasets with flexible statistical models has become an enormously active area of research over the last decade. To this end, this project plans to develop a completely new class of methods, which are based on the idea of fitting statistical models on datasets obtained by compressing big data using a well designed mechanism. The development enables efficient modeling of massive data on an unprecedented scale. While the motivation of the investigators comes primarily from complex modeling and uncertainty quantification of massive biomedical data, the statistical methods are general enough to set important footprints in the related literature of machine learning and environmental sciences. The overarching goal also includes the development of software toolkits to better serve practitioners in related disciplines. Further, the projects will provide first hand training opportunities for graduate and undergraduate students, including female and students from minority communities, in state-of-the-art statistical methodologies and imaging/genetic/EHR data. By disseminating the outcome of the project among high school students in terminology that they can understand, the project can have far reaching effects to enhance public scientific literacy about statistics.Two crucial aspects of modern statistical learning approaches in the era of complex and high dimensional data are accuracy and scale in inference. Modern data are increasingly complex and high dimensional, involving a large number of variables and large sample size, with complex relationships between different variables. Developing practically efficient (in terms of storage and analysis) and theoretically “optimal” Bayesian high dimensional parametric or nonparametric regression methods to draw accurate inference with valid uncertainties from such complex datasets is an extremely important problem. To offer a general solution for this problem, the investigators will develop approaches based on data compression using a small number of random linear transformations. The approach either reduces a large number of records corresponding to each variable using compression, in which case it maintains feature interpretation for adequate inference, or, reduces the dimension of the covariate vector for each sample using compression, in which case the focus is only on prediction of the response. In either case, data compression facilitates drawing storage efficient, scalable and accurate Bayesian inference/prediction in presence of high dimensional data with sufficiently rich parametric and nonparametric regression models. An important goal is to establish precise theoretical results on the convergence behavior of the fitted models with compressed data as a function of the number of predictors, sample size, properties of random linear transformations and features of these models. The approaches will be used to study neurological disorders by combining brain imaging data, genetic data and electronic health records (EHR) data from the UK Biobank database. The project will also contribute on a broader front to advancing the interdisciplinary research training and broadening participation in statistical sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着科学界进入数据驱动的时代,利用大规模成像,遗传和EHR数据来更好地描述和了解人类疾病以改善治疗和预后的机会前所未有。因此,在过去十年中,使用灵活的统计模型分析这些数据集已成为一个非常活跃的研究领域。为此,该项目计划开发一种全新的方法,该方法基于使用精心设计的机制压缩大数据获得的数据集拟合统计模型的想法。这一发展使得能够以前所未有的规模对海量数据进行有效建模。虽然研究人员的动机主要来自于大量生物医学数据的复杂建模和不确定性量化,但统计方法足够通用,可以在机器学习和环境科学的相关文献中留下重要的足迹。总体目标还包括开发软件工具包,以便更好地为相关学科的从业人员服务。 此外,这些项目还将为研究生和本科生,包括女生和少数族裔学生提供最先进的统计方法和成像/遗传/电子健康记录数据方面的第一手培训机会。透过以中学生能够理解的术语向他们传播项目的成果,该项目可以对提高公众对统计的科学素养产生深远的影响。在复杂和高维数据的时代,现代统计学习方法的两个关键方面是准确性和推理的规模。现代数据越来越复杂和高维,涉及的变量数量多,样本量大,不同变量之间的关系复杂。开发实际上有效的(在存储和分析方面)和理论上“最优”的贝叶斯高维参数或非参数回归方法,以从这样复杂的数据集得出具有有效不确定性的准确推断是一个非常重要的问题。为了提供这个问题的一般解决方案,研究人员将开发基于数据压缩的方法,使用少量的随机线性变换。该方法或者使用压缩来减少对应于每个变量的大量记录,在这种情况下,它保持特征解释以进行充分的推断,或者使用压缩来减少每个样本的协变量向量的维度,在这种情况下,焦点仅在于响应的预测。在任何一种情况下,数据压缩都有助于在具有足够丰富的参数和非参数回归模型的高维数据的情况下实现高效、可扩展和准确的贝叶斯推断/预测。一个重要的目标是建立精确的理论结果的收敛行为的拟合模型与压缩数据作为一个函数的预测变量的数量,样本量,随机线性变换的属性和这些模型的功能。这些方法将通过结合英国生物银行数据库的脑成像数据、遗传数据和电子健康记录(EHR)数据来研究神经系统疾病。该项目还将在更广泛的战线上促进跨学科研究培训和扩大统计科学的参与。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Distributed Bayesian Inference in Massive Spatial Data
  • DOI:
    10.1214/22-sts868
  • 发表时间:
    2023-01
  • 期刊:
  • 影响因子:
    5.7
  • 作者:
    Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
  • 通讯作者:
    Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rajarshi Guhaniyogi其他文献

Bayesian Conditional Density Filtering
贝叶斯条件密度过滤
Bayesian nonparametric areal wombling for small‐scale maps with an application to urinary bladder cancer data from Connecticut
小比例尺地图的贝叶斯非参数区域波动及其在康涅狄格州膀胱癌数据中的应用
  • DOI:
    10.1002/sim.7408
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Rajarshi Guhaniyogi
  • 通讯作者:
    Rajarshi Guhaniyogi
Approximated Bayesian Inference for Massive Streaming Data
海量流数据的近似贝叶斯推理
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rajarshi Guhaniyogi;R. Willett;D. Dunson
  • 通讯作者:
    D. Dunson
InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data
InVA:用于协调多模态神经影像数据的综合变分自动编码器
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bowen Lei;Rajarshi Guhaniyogi;Krishnendu Chandra;Aaron Scheffler;Bani Mallick
  • 通讯作者:
    Bani Mallick
Data Sketching and Stacking: A Confluence of Two Strategies for Predictive Inference in Gaussian Process Regressions with High-Dimensional Features
数据草图和堆叠:具有高维特征的高斯过程回归中预测推理的两种策略的融合
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Samuel Gailliot;Rajarshi Guhaniyogi;Roger D. Peng
  • 通讯作者:
    Roger D. Peng

Rajarshi Guhaniyogi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rajarshi Guhaniyogi', 18)}}的其他基金

Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
  • 批准号:
    2220840
  • 财政年份:
    2021
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
  • 批准号:
    1854662
  • 财政年份:
    2019
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: NCS-FR: DEJA-VU: Design of Joint 3D Solid-State Learning Machines for Various Cognitive Use-Cases
合作研究:NCS-FR:DEJA-VU:针对各种认知用例的联合 3D 固态学习机设计
  • 批准号:
    2319619
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Continuing Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
  • 批准号:
    2225023
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: BoCP-Design US-Sao Paulo: Land use change, ecosystem resilience and zoonotic spillover risk
合作研究:BoCP-Design US-Sao Paulo:土地利用变化、生态系统恢复力和人畜共患病溢出风险
  • 批准号:
    2225022
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: CAS-Climate: Linking Activities, Expenditures and Energy Use into an Integrated Systems Model to Understand and Predict Energy Futures
合作研究:CAS-气候:将活动、支出和能源使用连接到集成系统模型中,以了解和预测能源未来
  • 批准号:
    2243099
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: BoCP-Design: US-South Africa: Turning CO2 to stone: the ecosystem service of the oxalate-carbonate pathway and its sensitivity to land use change
合作研究:BoCP-设计:美国-南非:将二氧化碳转化为石头:草酸盐-碳酸盐途径的生态系统服务及其对土地利用变化的敏感性
  • 批准号:
    2224994
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: MUCUS: Measuring and Understanding the Cassiopea Use of Space
合作研究:MUCUS:测量和理解仙后座对空间的利用
  • 批准号:
    2227068
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: PPoSS: LARGE: Research into the Use and iNtegration of Data Movement Accelerators (RUN-DMX)
协作研究:PPoSS:大型:数据移动加速器 (RUN-DMX) 的使用和集成研究
  • 批准号:
    2316176
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Continuing Grant
Collaborative Research: RUI: Trust but Verify: The Use of Intuition in Engineering Problem Solving
合作研究:RUI:信任但验证:直觉在工程问题解决中的运用
  • 批准号:
    2325524
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: CyberTraining: Pilot: Building a strong community of computational researchers empowered in the use of novel cutting-edge technologies
协作研究:网络培训:试点:建立一个强大的计算研究人员社区,有权使用新颖的尖端技术
  • 批准号:
    2320990
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Sustainable Open Science Tools to Democratize Use of 3D Geomaterial Data
合作研究:GEO OSE 第 2 轨:可持续开放科学工具使 3D 岩土材料数据的使用民主化
  • 批准号:
    2324786
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了