Robust and Distributed Statistical Learning from Big Data
从大数据中进行稳健的分布式统计学习
基本信息
- 批准号:1712591
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-15 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Big Data are ubiquitous in many areas of science, engineering, social sciences, and the humanities, and have significant impact in terms of technological innovation and economic development. This project seeks to introduce effective methods for robust high-dimensional statistical inference that are insensitive to the potential poor quality of big data, and to develop distributed estimation that is needed for Big Data analysis, computing, and optimization. The research will address several robust and distributed statistical inference problems for Big Data in genomics, genetics, neuroscience, machine learning, economics, and finance. The project will advance our understanding of molecular mechanisms, biological processes, genetic associations, brain functions, and economic and financial risk. Integration of research and education will be achieved through the involvement of undergraduate students, graduate students, and postdoctoral fellows, and the development of publicly available computer code for robust and distributed analysis of Big Data with sound theoretical support. Working closely with industrial partners, the research will lead to increased collaborations between academia and industry. The project will lead to the development of novel statistical theory, methods, and algorithms for robust statistical inference from high-dimensional statistics and Big Data. The first aim seeks to introduce a simple and widely applicable principle for robust inference via an appropriate shrinkage of observed data or loss functions. This reduces the influence of outliers and heavy-tailed distributions, and weakens the moment conditions from sub-Gaussian distributions to bounded second moments for regression or fourth moments for covariance estimation. The research includes plans to systematically develop the theory and methods for robust estimation of high-dimension means, and implementation of these methods to control false discovery rates in large scale inference for gene and transcripts selection, robust regularization of covariance and precision matrices, and their applications to robust principal component analysis, factor analysis and high-dimensional hypothesis testing. In addition, robust sparse regression, model selection, and low-rank matrix recovery will also be investigated. The second aim focuses on making the proposed robust procedures applicable to the Big Data environment via the development of distributed estimation and inference. In particular, divide-and-conquer methods will be used to distribute the computation to node machines and to solve privacy and data ownership issues. Approaches to reduce the information loss due to the distributed computation for likelihood based models via partial communication of the Hessian matrices will be investigated. Two important classes of problems, trace regression and principal component analysis, will be used to illustrate the proposed methods.
大数据在科学、工程、社会科学和人文科学的许多领域无处不在,对技术创新和经济发展产生重大影响。 该项目旨在引入对大数据潜在的低质量不敏感的强大高维统计推断的有效方法,并开发大数据分析,计算和优化所需的分布式估计。 该研究将解决基因组学,遗传学,神经科学,机器学习,经济学和金融大数据的几个强大的分布式统计推断问题。 该项目将促进我们对分子机制,生物过程,遗传关联,大脑功能以及经济和金融风险的理解。 研究和教育的整合将通过本科生,研究生和博士后研究员的参与来实现,并通过合理的理论支持开发公共可用的计算机代码,用于大数据的强大和分布式分析。 与工业合作伙伴密切合作,研究将导致学术界和工业界之间的合作增加。 该项目将导致新的统计理论,方法和算法的发展,用于从高维统计和大数据中进行稳健的统计推断。 第一个目标是通过适当收缩观测数据或损失函数,引入一个简单而广泛适用的鲁棒推理原理。 这减少了异常值和重尾分布的影响,并将矩条件从亚高斯分布减弱为回归的有界二阶矩或协方差估计的四阶矩。 本研究计划系统地发展高维均值稳健估计的理论和方法,并实现这些方法在基因和转录本选择的大规模推断中控制错误发现率,协方差和精度矩阵的稳健正则化,以及它们在稳健主成分分析,因子分析和高维假设检验中的应用。 此外,稳健稀疏回归,模型选择和低秩矩阵恢复也将进行研究。 第二个目标是通过分布式估计和推理的发展,使所提出的鲁棒程序适用于大数据环境。 特别是,分而治之的方法将被用于将计算分配到节点机器,并解决隐私和数据所有权问题。 将研究通过Hessian矩阵的部分通信来减少由于基于似然模型的分布式计算而导致的信息损失的方法。 两个重要类的问题,跟踪回归和主成分分析,将被用来说明所提出的方法。
项目成果
期刊论文数量(38)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Spectral Methods for Data Science: A Statistical Perspective
- DOI:10.1561/2200000079
- 发表时间:2021-01-01
- 期刊:
- 影响因子:32.8
- 作者:Chen, Yuxin;Chi, Yuejie;Ma, Cong
- 通讯作者:Ma, Cong
DISTRIBUTED ESTIMATION OF PRINCIPAL EIGENSPACES
- DOI:10.1214/18-aos1713
- 发表时间:2019-12-01
- 期刊:
- 影响因子:4.5
- 作者:Fan, Jianqing;Wang, Dong;Zhu, Ziwei
- 通讯作者:Zhu, Ziwei
An ℓp theory of PCA and spectral clustering
PCA 和谱聚类的Ⅹp 理论
- DOI:10.1214/22-aos2196
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Abbe, Emmanuel;Fan, Jianqing;Wang, Kaizheng
- 通讯作者:Wang, Kaizheng
Augmented factor models with applications to validating market risk factors and forecasting bond risk premia
增强因子模型及其用于验证市场风险因子和预测债券风险溢价的应用
- DOI:10.1016/j.jeconom.2020.07.002
- 发表时间:2021
- 期刊:
- 影响因子:6.3
- 作者:Fan, Jianqing;Ke, Yuan;Liao, Yuan
- 通讯作者:Liao, Yuan
Communication-Efficient Accurate Statistical Estimation
- DOI:10.1080/01621459.2021.1969238
- 发表时间:2019-06
- 期刊:
- 影响因子:3.7
- 作者:Jianqing Fan;Yongyi Guo;Kaizheng Wang
- 通讯作者:Jianqing Fan;Yongyi Guo;Kaizheng Wang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jianqing Fan其他文献
Deep Neural Networks for Nonparametric Interaction Models with Diverging Dimension
具有发散维度的非参数交互模型的深度神经网络
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Sohom Bhattacharya;Jianqing Fan;Debarghya Mukherjee - 通讯作者:
Debarghya Mukherjee
Dynamic nonparametric filtering with application to volatility estimation
动态非参数滤波及其在波动率估计中的应用
- DOI:
10.1016/b978-044451378-6/50021-1 - 发表时间:
2003 - 期刊:
- 影响因子:0
- 作者:
Ming;Jianqing Fan;V. Spokoiny - 通讯作者:
V. Spokoiny
Approaches to High-Dimensional Covariance and Precision Matrix Estimations
高维协方差和精度矩阵估计的方法
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Jianqing Fan;Yuan Liao;Han Liu - 通讯作者:
Han Liu
Improving Covariate Balancing Propensity Score : A Doubly Robust and Efficient Approach ∗
提高协变量平衡倾向评分:双重稳健和高效的方法*
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Jianqing Fan;K. Imai;Han Liu;Y. Ning;Xiaolin Yang - 通讯作者:
Xiaolin Yang
Features of Big Data and sparsest solution in high confidence set
- DOI:
10.1201/b16720-48 - 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Jianqing Fan - 通讯作者:
Jianqing Fan
Jianqing Fan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jianqing Fan', 18)}}的其他基金
Interface of Statistical Learning and Optimal Decisions
统计学习和最优决策的接口
- 批准号:
2210833 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
DMS/NIGMS 2: Collaborative Research: Developing Statistical Learning Methods for Revealing the Molecular Signatures of Microvascular Changes in Neural Injury
DMS/NIGMS 2:合作研究:开发统计学习方法来揭示神经损伤中微血管变化的分子特征
- 批准号:
2053832 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
FRG: Collaborative Research: Flexible Network Inference
FRG:协作研究:灵活的网络推理
- 批准号:
2052926 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: Statistical Methods for RNA-seq Based Transcriptomic Analysis of Macrophage Function in Spinal Cord Injury
合作研究:基于RNA-seq的脊髓损伤中巨噬细胞功能转录组学分析的统计方法
- 批准号:
1662139 - 财政年份:2017
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Collaborative Research: Interface of Probability and Statistics for High-dimensional Inference
合作研究:高维推理的概率统计接口
- 批准号:
1406266 - 财政年份:2014
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Workshop on: Discovery in Complex or Massive Datasets: Common Statistical Themes
研讨会:复杂或海量数据集中的发现:常见统计主题
- 批准号:
0751568 - 财政年份:2007
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: Development of bioinformatic methods for studying gene expression network inflammation and neuronal regeneration
合作研究:开发用于研究基因表达网络炎症和神经元再生的生物信息学方法
- 批准号:
0714554 - 财政年份:2007
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
High-dimensional statistical learning and inference
高维统计学习和推理
- 批准号:
0704337 - 财政年份:2007
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Workshop on Frontiers of Statistics: Nonparametric Modeling of Complex Data
统计前沿研讨会:复杂数据的非参数建模
- 批准号:
0531839 - 财政年份:2006
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似国自然基金
Graphon mean field games with partial observation and application to failure detection in distributed systems
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Developing Statistical Tools and Visualization Methods for Understanding Heterogeneity in Distributed Networks: Applications to COVID-19 and Diabetes
开发统计工具和可视化方法来理解分布式网络中的异质性:在 COVID-19 和糖尿病中的应用
- 批准号:
468555 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Operating Grants
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual