权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Algorithms for Bayesian On-Line Learning with Large-Scale Dynamic Data

用于大规模动态数据的贝叶斯在线学习的可扩展算法

基本信息

批准号：
2015498
负责人：
Faming Liang
金额：
$ 25万
依托单位：
Purdue University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-08-01 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2015498&HistoricalAwards=false
关键词：
Scalable Algorithms Bayesian Line Learning

项目摘要

Bayesian methods provide a principled way for assessing model uncertainty in machine learning of big data, which is critical to the development of trustworthy artificial intelligence (AI). However, the lack of efficient Monte Carlo algorithms has drastically hindered applications of Bayesian methods in the big data era. Compared to frequentist methods, Bayesian methods are often much slower. To tackle this difficulty, a variety of scalable Monte Carlo algorithms have been developed in the recent literature. However, these algorithms can only be applied to static data; none of them can be directly applied to dynamic data. Many of the problems centering data science, such as natural language processing, autonomous car driving and weather forecasting, are facing challenges of dynamic data. The traditional particle filters or sequential Monte Carlo algorithms lack the scalability necessary for dealing with large-scale dynamic data. By reformulating the ensemble Kalman filter (EnKF) under the framework of Langevin dynamics, this project proposes Langevinized EnKF as a general and scalable stochastic gradient sequential Monte Carlo algorithm for Bayesian on-line learning with large-scale dynamic data. The Langevinized EnKF improves uncertainty quantification for a wide class of data assimilation problems, advancing the development of trustworthy AI. Successful completion of this project will generate a set of scalable and theoretically rigorous algorithms for Bayesian on-line learning, which can provide significant benefits to the development of data driven technologies. The research results will be disseminated to communities of interest via collaborations, publications, and conference presentations. The project will also have significant impacts on education through direct involvement of graduate students and incorporation of the research results into undergraduate and graduate courses. Although the EnKF has been extremely successful in dealing with complex dynamic data encountered in oceanography, reservoir modeling and weather forecasting, it does not converge to the right filtering distribution except for linear systems in the large ensemble limit. The Langevinized EnKF resolves this issue; it converges to the right filtering distribution in data assimilation and is thus able to quantify uncertainty of the underlying dynamic system. The Langevinized EnKF can also be used for Bayesian learning with large-scale statistic data by reformulating the Bayesian inverse problem as a state-space model with Langevin dynamics and the subsampling technique. Different variants of the Langevinized EnKF will be developed to extend its applications to non-Gaussian data and incomplete data. As the whole, this project will provide a complete treatment for Bayesian analysis of big data. The Langevinized EnKF can be applied to big data problems in various data scenarios: dynamic data and static data, Gaussian data and non-Gaussian data, and complete data and incomplete data, provided the data is classified in different ways. Statistical theory underlying the Langevinized EnKF will be rigorously studied. Exciting scientific applications, including language modeling and dynamic network analysis, will be conducted.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

贝叶斯方法为评估大数据机器学习中的模型不确定性提供了一种原则性的方法，这对于开发可信赖的人工智能（AI）至关重要。然而，由于缺乏有效的蒙特卡罗算法，极大地阻碍了贝叶斯方法在大数据时代的应用。与频率论方法相比，贝叶斯方法通常要慢得多。为了解决这个困难，各种可扩展的蒙特卡罗算法已经在最近的文献中开发。然而，这些算法只能应用于静态数据，没有一个可以直接应用于动态数据。许多以数据科学为中心的问题，如自然语言处理、自动驾驶汽车和天气预报，都面临着动态数据的挑战。传统的粒子滤波或顺序蒙特卡罗算法缺乏处理大规模动态数据所需的可扩展性。通过在Langevin动力学框架下对集合卡尔曼滤波器（EnKF）进行重构，提出了Langevinized EnKF作为一种通用的、可扩展的随机梯度序贯蒙特卡罗算法，用于大规模动态数据的贝叶斯在线学习。Langevinized EnKF改进了各种数据同化问题的不确定性量化，推动了值得信赖的AI的发展。该项目的成功完成将产生一套可扩展的和理论上严格的贝叶斯在线学习算法，这可以为数据驱动技术的发展提供显着的好处。研究成果将通过合作、出版物和会议演示传播给感兴趣的社区。该项目还将通过研究生的直接参与和将研究成果纳入本科生和研究生课程，对教育产生重大影响。虽然EnKF在处理海洋学、储层建模和天气预报中遇到的复杂动态数据方面非常成功，但除了大集合极限中的线性系统外，它不会收敛到正确的滤波分布。Langevinized EnKF解决了这个问题;它收敛到数据同化中的正确滤波分布，因此能够量化底层动态系统的不确定性。Langevinized EnKF也可以用于贝叶斯学习与大规模的统计数据，通过重新制定的贝叶斯逆问题作为一个状态空间模型与Langevin动力学和子采样技术。将开发Langevinized EnKF的不同变体，以将其应用扩展到非高斯数据和不完整数据。总体而言，该项目将为大数据的贝叶斯分析提供完整的处理方法。Langevinized EnKF可以应用于各种数据场景中的大数据问题：动态数据和静态数据，高斯数据和非高斯数据，以及完整数据和不完整数据，前提是数据以不同的方式分类。统计理论的Langevinized EnKF将进行严格的研究。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Nearly optimal Bayesian shrinkage for high-dimensional regression

DOI：
10.1007/s11425-020-1912-6
发表时间：
2017-12
期刊：
Science China Mathematics
影响因子：
0
作者：
Qifan Song;F. Liang
通讯作者：
Qifan Song;F. Liang

Bayesian Analysis of Exponential Random Graph Models Using Stochastic Gradient Markov Chain Monte Carlo

DOI：
10.1214/23-ba1364
发表时间：
2024-06-01
期刊：
BAYESIAN ANALYSIS
影响因子：
4.4
作者：
Zhang，Qian;Liang，Faming
通讯作者：
Liang，Faming

Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

DOI：
10.48550/arxiv.2210.04349
发表时间：
2022-10
期刊：
ArXiv
影响因子：
0
作者：
Siqi Liang;Y. Sun;F. Liang
通讯作者：
Siqi Liang;Y. Sun;F. Liang

Learning Sparse Deep Neural Networks with a Spike-and-Slab Prior.

DOI：
10.1016/j.spl.2021.109246
发表时间：
2022-01
期刊：
Statistics & probability letters
影响因子：
0.8
作者：
Y. Sun;Qifan Song;F. Liang
通讯作者：
Y. Sun;Qifan Song;F. Liang

Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration

DOI：
发表时间：
2021-10
期刊：
ArXiv
影响因子：
0
作者：
Y. Sun;Wenjun Xiong;F. Liang
通讯作者：
Y. Sun;Wenjun Xiong;F. Liang

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Faming Liang其他文献

Bayesian phylogeny analysis via stochastic approximation Monte Carlo

DOI：
10.1016/j.ympev.2009.06.019
发表时间：
2009-11-01
期刊：
Research article
影响因子：
作者：
Sooyoung Cheon;Faming Liang
通讯作者：
Faming Liang

Networks Involved in Coronary Collateral Formation

参与冠状动脉侧支形成的网络

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Jian Zhang;J. Regieli;M. Schipper;M. M. Entius;Faming Liang;J. Koerselman;H. J. Ruven;Yolanda van der Graaf;D. Grobbee;Pieter A. Doevendans;Pieter A. Doevendans
通讯作者：
Pieter A. Doevendans