Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
基本信息
- 批准号:1854662
- 负责人:
- 金额:$ 17.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-06-15 至 2022-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
With tremendous advancements in spatial referencing technologies such as Global Positioning Systems that can identify geographical coordinates with a simple hand-held device, researchers in various disciplines have gathered an unprecedented variety of geo-coded temporal data. Consequently, modeling spatiotemporal data with flexible statistical models has become an enormously active area of research over the last decade in many disciplines including the environmental sciences, health sciences and oceanography, among others. In all these applications, researchers require efficient data modeling tools that can adapt to the complexity and size of modern spatiotemporal data, empowering them to quickly fit a variety of scientific models that explain the intricate nature of associations. This research project develops a new class of distributed Bayesian statistical algorithms, the Aggregated Monte Carlo (AMC), that enables efficient modeling of massive spatiotemporal data on an unprecedented scale. While the motivation of the PIs comes primarily from complex modeling and uncertainty quantification of massive spatiotemporal data, the proposed algorithm is general enough to set important footprints in the related literature of machine learning and computer experiments. The overarching goal also includes the development of software toolkits to better serve practitioners in related disciplines. There has been an explosion in the size, complexity, and availability of spatiotemporally indexed data. This event has outpaced the development in Bayesian statistical methodology in that the fitting of state-of-the-art methods based on stochastic processes for analyzing spatiotemporal point referenced and point process data is prohibitively slow unless restrictive assumptions are imposed. The main problem is that the Monte Carlo (MC) computations in Markov chain Monte Carlo (MCMC) methods for fitting these models scale poorly with the size of the data. Solving this problem, the PIs develop a general framework, called Aggregated Monte Carlo (AMC), for scaling MC computations in the stochastic process-based modeling of massive space-time data using a divide-and-conquer technique. AMC has three stages that involve dividing the data into smaller subsets, obtaining posterior samples of the unknown parameters and latent variables across all the subsets using MCMC, and combining the MCMC samples from all the subsets. AMC is tuned to boost the scalability of any state-of-the-art model based on a stochastic process using a divide-and-conquer technique. Computationally, the main innovations include the development of general division and combination schemes for data with diverse spatiotemporal structures. Theoretically, the project provides bounds on the number of subsets such that the posterior distribution estimated using AMC provides a near optimal approximation of the full data posterior distribution in terms of decay of the posterior risks and contraction rates. Conceptually, AMC provides a natural extension of the existing results for combination using the barycenter of subset posterior distributions in parametric models to non-parametric models with complex spatiotemporal structures. The most appealing features of AMC are that it exploits parallel computer architecture for efficient and flexible modeling of massive spatiotemporal data and it provides posterior inference and uncertainty estimates with theoretical guarantees.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着全球定位系统等空间参考技术的巨大进步,可以通过简单的手持设备识别地理坐标,不同学科的研究人员收集了前所未有的各种地理编码的时间数据。因此,在过去十年中,使用灵活的统计模型对时空数据进行建模已成为许多学科的一个非常活跃的研究领域,其中包括环境科学、健康科学和海洋学等。在所有这些应用中,研究人员需要高效的数据建模工具,能够适应现代时空数据的复杂性和大小,使他们能够快速适应解释关联复杂性质的各种科学模型。该研究项目开发了一类新的分布式贝叶斯统计算法--聚集蒙特卡罗算法(AMC),该算法能够以前所未有的规模对海量时空数据进行高效建模。虽然PI的动机主要来自海量时空数据的复杂建模和不确定性量化,但所提出的算法足够通用,可以在机器学习和计算机实验的相关文献中留下重要的足迹。总体目标还包括开发软件工具包,以便更好地服务于相关学科的从业者。时空索引数据的大小、复杂性和可用性都呈爆炸式增长。这一事件超过了贝叶斯统计方法的发展,因为除非强加限制性假设,否则分析时空点参考数据和点过程数据的基于随机过程的最先进方法的适用速度极其缓慢。主要的问题是马尔可夫链蒙特卡罗(MCMC)方法中用于拟合这些模型的MC计算随数据的大小而变化。为了解决这个问题,PI开发了一个称为聚集蒙特卡罗(AMC)的通用框架,用于在基于随机过程的海量时空数据建模中使用分而治之的技术来缩放MC计算。AMC有三个阶段,包括将数据分成更小的子集,使用MCMC获得所有子集上未知参数和潜在变量的后验样本,以及合并所有子集的MCMC样本。AMC进行了调整,以提高任何基于随机过程的最先进模型的可扩展性,该过程使用分而治之的技术。在计算方面,主要的创新包括为具有不同时空结构的数据开发通用的划分和组合方案。理论上,该项目提供了子集数量的界限,使得使用AMC估计的后验分布在后验风险和收缩速率的衰减方面提供了完整数据后验分布的近乎最佳的近似。从概念上讲,AMC提供了现有结果的自然扩展,利用参数模型中子集后验分布的重心来组合到具有复杂时空结构的非参数模型。AMC最吸引人的特点是,它利用并行计算机体系结构对海量时空数据进行高效和灵活的建模,并为后验推理和不确定性估计提供理论保证。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Joint Bayesian Estimation of Voxel Activation and Inter-regional Connectivity in fMRI Experiments
- DOI:10.1007/s11336-020-09727-0
- 发表时间:2020-09
- 期刊:
- 影响因子:3
- 作者:Daniel Spencer;Rajarshi Guhaniyogi;R. Prado
- 通讯作者:Daniel Spencer;Rajarshi Guhaniyogi;R. Prado
Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior
- DOI:
- 发表时间:2020-06
- 期刊:
- 影响因子:0
- 作者:Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
- 通讯作者:Rajarshi Guhaniyogi;Cheng Li;T. Savitsky;Sanvesh Srivastava
High Dimensional Bayesian Regularization in Regressions Involving Symmetric Tensors
- DOI:10.1007/978-3-030-50153-2_26
- 发表时间:2020-05-16
- 期刊:
- 影响因子:0
- 作者:Guhaniyogi R
- 通讯作者:Guhaniyogi R
Bayesian Generalized Sparse Symmetric Tensor-on-Vector Regression
- DOI:10.1080/00401706.2020.1784799
- 发表时间:2020-07-18
- 期刊:
- 影响因子:2.5
- 作者:Guha, Sharmistha;Guhaniyogi, Rajarshi
- 通讯作者:Guhaniyogi, Rajarshi
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rajarshi Guhaniyogi其他文献
Bayesian Conditional Density Filtering
贝叶斯条件密度过滤
- DOI:
10.1080/10618600.2017.1422431 - 发表时间:
2014 - 期刊:
- 影响因子:2.4
- 作者:
Rajarshi Guhaniyogi;S. Qamar;D. Dunson - 通讯作者:
D. Dunson
Bayesian nonparametric areal wombling for small‐scale maps with an application to urinary bladder cancer data from Connecticut
小比例尺地图的贝叶斯非参数区域波动及其在康涅狄格州膀胱癌数据中的应用
- DOI:
10.1002/sim.7408 - 发表时间:
2017 - 期刊:
- 影响因子:2
- 作者:
Rajarshi Guhaniyogi - 通讯作者:
Rajarshi Guhaniyogi
Approximated Bayesian Inference for Massive Streaming Data
海量流数据的近似贝叶斯推理
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Rajarshi Guhaniyogi;R. Willett;D. Dunson - 通讯作者:
D. Dunson
InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data
InVA:用于协调多模态神经影像数据的综合变分自动编码器
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Bowen Lei;Rajarshi Guhaniyogi;Krishnendu Chandra;Aaron Scheffler;Bani Mallick - 通讯作者:
Bani Mallick
Data Sketching and Stacking: A Confluence of Two Strategies for Predictive Inference in Gaussian Process Regressions with High-Dimensional Features
数据草图和堆叠:具有高维特征的高斯过程回归中预测推理的两种策略的融合
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Samuel Gailliot;Rajarshi Guhaniyogi;Roger D. Peng - 通讯作者:
Roger D. Peng
Rajarshi Guhaniyogi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rajarshi Guhaniyogi', 18)}}的其他基金
Collaborative Research: Use of Random Compression Matrices For Scalable Inference in High Dimensional Structured Regressions
合作研究:使用随机压缩矩阵进行高维结构化回归中的可扩展推理
- 批准号:
2210672 - 财政年份:2022
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
- 批准号:
2220840 - 财政年份:2021
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SHF: Small: Tangram: Scaling into the Exascale Era with Reconfigurable Aggregated "Virtual Chips"
合作研究:SHF:小型:七巧板:通过可重构聚合“虚拟芯片”扩展到百亿亿次时代
- 批准号:
2245129 - 财政年份:2022
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
- 批准号:
2220840 - 财政年份:2021
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Tangram: Scaling into the Exascale Era with Reconfigurable Aggregated "Virtual Chips"
合作研究:SHF:小型:七巧板:通过可重构聚合“虚拟芯片”扩展到百亿亿次时代
- 批准号:
2124525 - 财政年份:2021
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Tangram: Scaling into the Exascale Era with Reconfigurable Aggregated "Virtual Chips"
合作研究:SHF:小型:七巧板:通过可重构聚合“虚拟芯片”扩展到百亿亿次时代
- 批准号:
2008911 - 财政年份:2020
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Tangram: Scaling into the Exascale Era with Reconfigurable Aggregated "Virtual Chips"
合作研究:SHF:小型:七巧板:通过可重构聚合“虚拟芯片”扩展到百亿亿次时代
- 批准号:
2007796 - 财政年份:2020
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Tangram: Scaling into the Exascale Era with Reconfigurable Aggregated "Virtual Chips"
合作研究:SHF:小型:七巧板:通过可重构聚合“虚拟芯片”扩展到百亿亿次时代
- 批准号:
2008477 - 财政年份:2020
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
- 批准号:
1854667 - 财政年份:2019
- 资助金额:
$ 17.2万 - 项目类别:
Standard Grant
Collaborative Research: Development of 2D IR Spectroscopy as a Quantitative Probe of Protein Structure, with Applications to Membrane and Aggregated Proteins
合作研究:开发二维红外光谱作为蛋白质结构的定量探针,并应用于膜和聚集蛋白质
- 批准号:
0832580 - 财政年份:2008
- 资助金额:
$ 17.2万 - 项目类别:
Continuing Grant
Collaborative Research: Development of 2D IR Spectroscopy as a Quantitative Probe of Protein Structure, with Applications to Membrane and Aggregated Proteins
合作研究:开发二维红外光谱作为蛋白质结构的定量探针,并应用于膜和聚集蛋白质
- 批准号:
0832591 - 财政年份:2008
- 资助金额:
$ 17.2万 - 项目类别:
Continuing Grant
Collaborative Research: Development of 2D IR Spectroscopy as a Quantitative Probe of Protein Structure, with Applications to Membrane and Aggregated Proteins
合作研究:开发二维红外光谱作为蛋白质结构的定量探针,并应用于膜和聚集蛋白质
- 批准号:
0832584 - 财政年份:2008
- 资助金额:
$ 17.2万 - 项目类别:
Continuing Grant