Scalable Algorithms for Bayesian On-Line Learning with Large-Scale Dynamic Data
用于大规模动态数据的贝叶斯在线学习的可扩展算法
基本信息
- 批准号:2015498
- 负责人:
- 金额:$ 25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Bayesian methods provide a principled way for assessing model uncertainty in machine learning of big data, which is critical to the development of trustworthy artificial intelligence (AI). However, the lack of efficient Monte Carlo algorithms has drastically hindered applications of Bayesian methods in the big data era. Compared to frequentist methods, Bayesian methods are often much slower. To tackle this difficulty, a variety of scalable Monte Carlo algorithms have been developed in the recent literature. However, these algorithms can only be applied to static data; none of them can be directly applied to dynamic data. Many of the problems centering data science, such as natural language processing, autonomous car driving and weather forecasting, are facing challenges of dynamic data. The traditional particle filters or sequential Monte Carlo algorithms lack the scalability necessary for dealing with large-scale dynamic data. By reformulating the ensemble Kalman filter (EnKF) under the framework of Langevin dynamics, this project proposes Langevinized EnKF as a general and scalable stochastic gradient sequential Monte Carlo algorithm for Bayesian on-line learning with large-scale dynamic data. The Langevinized EnKF improves uncertainty quantification for a wide class of data assimilation problems, advancing the development of trustworthy AI. Successful completion of this project will generate a set of scalable and theoretically rigorous algorithms for Bayesian on-line learning, which can provide significant benefits to the development of data driven technologies. The research results will be disseminated to communities of interest via collaborations, publications, and conference presentations. The project will also have significant impacts on education through direct involvement of graduate students and incorporation of the research results into undergraduate and graduate courses. Although the EnKF has been extremely successful in dealing with complex dynamic data encountered in oceanography, reservoir modeling and weather forecasting, it does not converge to the right filtering distribution except for linear systems in the large ensemble limit. The Langevinized EnKF resolves this issue; it converges to the right filtering distribution in data assimilation and is thus able to quantify uncertainty of the underlying dynamic system. The Langevinized EnKF can also be used for Bayesian learning with large-scale statistic data by reformulating the Bayesian inverse problem as a state-space model with Langevin dynamics and the subsampling technique. Different variants of the Langevinized EnKF will be developed to extend its applications to non-Gaussian data and incomplete data. As the whole, this project will provide a complete treatment for Bayesian analysis of big data. The Langevinized EnKF can be applied to big data problems in various data scenarios: dynamic data and static data, Gaussian data and non-Gaussian data, and complete data and incomplete data, provided the data is classified in different ways. Statistical theory underlying the Langevinized EnKF will be rigorously studied. Exciting scientific applications, including language modeling and dynamic network analysis, will be conducted.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
贝叶斯方法为评估大数据机器学习中的模型不确定性提供了一种原则性的方法,这对于开发可信赖的人工智能(AI)至关重要。然而,由于缺乏有效的蒙特卡罗算法,极大地阻碍了贝叶斯方法在大数据时代的应用。与频率论方法相比,贝叶斯方法通常要慢得多。为了解决这个困难,各种可扩展的蒙特卡罗算法已经在最近的文献中开发。然而,这些算法只能应用于静态数据,没有一个可以直接应用于动态数据。 许多以数据科学为中心的问题,如自然语言处理、自动驾驶汽车和天气预报,都面临着动态数据的挑战。传统的粒子滤波或顺序蒙特卡罗算法缺乏处理大规模动态数据所需的可扩展性。通过在Langevin动力学框架下对集合卡尔曼滤波器(EnKF)进行重构,提出了Langevinized EnKF作为一种通用的、可扩展的随机梯度序贯蒙特卡罗算法,用于大规模动态数据的贝叶斯在线学习。Langevinized EnKF改进了各种数据同化问题的不确定性量化,推动了值得信赖的AI的发展。该项目的成功完成将产生一套可扩展的和理论上严格的贝叶斯在线学习算法,这可以为数据驱动技术的发展提供显着的好处。研究成果将通过合作、出版物和会议演示传播给感兴趣的社区。该项目还将通过研究生的直接参与和将研究成果纳入本科生和研究生课程,对教育产生重大影响。虽然EnKF在处理海洋学、储层建模和天气预报中遇到的复杂动态数据方面非常成功,但除了大集合极限中的线性系统外,它不会收敛到正确的滤波分布。Langevinized EnKF解决了这个问题;它收敛到数据同化中的正确滤波分布,因此能够量化底层动态系统的不确定性。Langevinized EnKF也可以用于贝叶斯学习与大规模的统计数据,通过重新制定的贝叶斯逆问题作为一个状态空间模型与Langevin动力学和子采样技术。将开发Langevinized EnKF的不同变体,以将其应用扩展到非高斯数据和不完整数据。总体而言,该项目将为大数据的贝叶斯分析提供完整的处理方法。Langevinized EnKF可以应用于各种数据场景中的大数据问题:动态数据和静态数据,高斯数据和非高斯数据,以及完整数据和不完整数据,前提是数据以不同的方式分类。统计理论的Langevinized EnKF将进行严格的研究。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Nearly optimal Bayesian shrinkage for high-dimensional regression
- DOI:10.1007/s11425-020-1912-6
- 发表时间:2017-12
- 期刊:
- 影响因子:0
- 作者:Qifan Song;F. Liang
- 通讯作者:Qifan Song;F. Liang
Bayesian Analysis of Exponential Random Graph Models Using Stochastic Gradient Markov Chain Monte Carlo
- DOI:10.1214/23-ba1364
- 发表时间:2024-06-01
- 期刊:
- 影响因子:4.4
- 作者:Zhang,Qian;Liang,Faming
- 通讯作者:Liang,Faming
Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network
- DOI:10.48550/arxiv.2210.04349
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Siqi Liang;Y. Sun;F. Liang
- 通讯作者:Siqi Liang;Y. Sun;F. Liang
Learning Sparse Deep Neural Networks with a Spike-and-Slab Prior.
- DOI:10.1016/j.spl.2021.109246
- 发表时间:2022-01
- 期刊:
- 影响因子:0.8
- 作者:Y. Sun;Qifan Song;F. Liang
- 通讯作者:Y. Sun;Qifan Song;F. Liang
Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration
- DOI:
- 发表时间:2021-10
- 期刊:
- 影响因子:0
- 作者:Y. Sun;Wenjun Xiong;F. Liang
- 通讯作者:Y. Sun;Wenjun Xiong;F. Liang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Faming Liang其他文献
Bayesian phylogeny analysis via stochastic approximation Monte Carlo
- DOI:
10.1016/j.ympev.2009.06.019 - 发表时间:
2009-11-01 - 期刊:
- 影响因子:
- 作者:
Sooyoung Cheon;Faming Liang - 通讯作者:
Faming Liang
Networks Involved in Coronary Collateral Formation
参与冠状动脉侧支形成的网络
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Jian Zhang;J. Regieli;M. Schipper;M. M. Entius;Faming Liang;J. Koerselman;H. J. Ruven;Yolanda van der Graaf;D. Grobbee;Pieter A. Doevendans;Pieter A. Doevendans - 通讯作者:
Pieter A. Doevendans
A New Paradigm for Generative Adversarial Networks Based on Randomized Decision Rules
基于随机决策规则的生成对抗网络新范式
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:1.4
- 作者:
Sehwan Kim;Qifan Song;Faming Liang - 通讯作者:
Faming Liang
An extended Langevinized ensemble Kalman filter for non-Gaussian dynamic systems
用于非高斯动态系统的扩展 Langevinized 系综卡尔曼滤波器
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Peiyi Zhang;Tianning Dong;Faming Liang - 通讯作者:
Faming Liang
Fast Value Tracking for Deep Reinforcement Learning
深度强化学习的快速价值跟踪
- DOI:
10.48550/arxiv.2403.13178 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Frank Shih;Faming Liang - 通讯作者:
Faming Liang
Faming Liang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Faming Liang', 18)}}的其他基金
A New Stochastic Neural Network: Statistical Perspectives and Applications
一种新的随机神经网络:统计视角和应用
- 批准号:
2210819 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Statistical Inference for Biomedical Big Data: Theory, Methods, and Tools
生物医学大数据的统计推断:理论、方法和工具
- 批准号:
1703077 - 财政年份:2017
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
On Statistical Modeling and Parameter Estimation for High Dimensional Systems
高维系统的统计建模和参数估计
- 批准号:
1818674 - 财政年份:2017
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
On Statistical Modeling and Parameter Estimation for High Dimensional Systems
高维系统的统计建模和参数估计
- 批准号:
1612924 - 财政年份:2016
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Monte Carlo Methods for Analysis of Large Spatial Data
用于分析大空间数据的蒙特卡罗方法
- 批准号:
1545738 - 财政年份:2015
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: Efficient Parallel Iterative Monte Carlo Methods for Statistical Analysis of Big Data
合作研究:用于大数据统计分析的高效并行迭代蒙特卡罗方法
- 批准号:
1545202 - 财政年份:2015
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: Efficient Parallel Iterative Monte Carlo Methods for Statistical Analysis of Big Data
合作研究:用于大数据统计分析的高效并行迭代蒙特卡罗方法
- 批准号:
1317131 - 财政年份:2013
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Monte Carlo Methods for Analysis of Large Spatial Data
用于分析大空间数据的蒙特卡罗方法
- 批准号:
1106494 - 财政年份:2011
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Sampling from Distributions with Intractable Integrals
从具有棘手积分的分布中采样
- 批准号:
1007457 - 财政年份:2010
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Development of Stochastic Approximation Monte Carlo Methods
随机逼近蒙特卡罗方法的发展
- 批准号:
0706755 - 财政年份:2007
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
相似海外基金
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Designing Bayesian based Adaptive Resource Constrained Hardware Algorithms for Next Generation of Embedded Systems
为下一代嵌入式系统设计基于贝叶斯的自适应资源受限硬件算法
- 批准号:
2890421 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Studentship
developing and validating advanced Bayesian optimization algorithms
开发和验证先进的贝叶斯优化算法
- 批准号:
2885563 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Studentship
Investigation and deployment of novel Bayesian inference algorithms in CAVATICA for identifying genomic variants underlying congenital heart defects in Down syndrome individuals
在 CAVATICA 中研究和部署新型贝叶斯推理算法,用于识别唐氏综合症个体先天性心脏缺陷的基因组变异
- 批准号:
10658217 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Developing Efficient Numerical Algorithms Using Fast Bayesian Random Forests
使用快速贝叶斯随机森林开发高效的数值算法
- 批准号:
2748743 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Studentship
Advanced Bayesian Inversion Algorithms for Wave Propagation
用于波传播的高级贝叶斯反演算法
- 批准号:
DP220102243 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Discovery Projects
PAC-Bayesian transfer learning: theory and algorithms
PAC-贝叶斯迁移学习:理论和算法
- 批准号:
RGPIN-2020-07223 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Discovery Grants Program - Individual
Scalable Algorithms for Uncertainty Quantification and Bayesian Inference with Applications to Computational Mechanics
不确定性量化和贝叶斯推理的可扩展算法及其在计算力学中的应用
- 批准号:
RGPIN-2017-06375 - 财政年份:2022
- 资助金额:
$ 25万 - 项目类别:
Discovery Grants Program - Individual
Scalable Algorithms for Uncertainty Quantification and Bayesian Inference with Applications to Computational Mechanics
不确定性量化和贝叶斯推理的可扩展算法及其在计算力学中的应用
- 批准号:
RGPIN-2017-06375 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别:
Discovery Grants Program - Individual