Statistical Machine Learning Methods for Complex Data Sets
复杂数据集的统计机器学习方法
基本信息
- 批准号:1811315
- 负责人:
- 金额:$ 12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2019-10-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Recent advances in science and technology have led to the generation of massive amounts of large-scale data with complex structures, including genomics, neuroimaging, and microbiology data. These large-scale datasets pose significant statistical and computational challenges to data analysis. Firstly, widely used statistical methods yield unstable estimates and are not computationally scalable to modeling large-scale data sets. Secondly, complex data sets are often accompanied by outliers due to possibly measurement error or heavy-tailed random noise. For instance, in genomic studies, it has been observed that the distribution of gene expression levels is generally heavy-tailed, that is, the data contain a lot of extremely large values. Classical statistical methods will yield biased estimates and spurious scientific discovery if these outliers are not taken into account during model estimation and inference. This project aims to develop scalable and robust multivariate statistical methods to address the aforementioned problems. In this project, the investigator uses a combination of regularization and statistical optimization techniques to develop novel multivariate statistical methods for analyzing complex high-dimensional data sets. The first part of the project concerns the sparse generalized eigenvalue problem, which arises naturally in many statistical models such as partial least squares, canonical correlation analysis, sufficient dimension reduction, and Fisher's discriminant analysis. The investigator will develop a general framework for solving the sparse generalized eigenvalue problem and make available a wide range of statistical models for analyzing high-dimensional data. Furthermore, the investigator will study the theoretical properties of sparse generalized eigenvalue problem, and this will lead to the understanding of various statistical models that are previously not well understood in the high-dimensional setting. The second part of the research project focuses on a class of robust sparse reduced rank regression models. The investigator will develop efficient algorithms and high-dimensional asymptotic analysis for the resulting estimators under the Huber loss function, and quantify the bias-robust tradeoff between using Huber loss and squared error loss. This research project will also deliver easy-to-use software packages for fitting the developed methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学和技术的最新进展导致产生了大量具有复杂结构的大规模数据,包括基因组学,神经成像和微生物学数据。这些大规模数据集对数据分析提出了重大的统计和计算挑战。首先,广泛使用的统计方法产生不稳定的估计,并且在计算上不可扩展以建模大规模数据集。其次,复杂的数据集往往伴随着离群值,由于可能的测量误差或重尾随机噪声。例如,在基因组研究中,已经观察到基因表达水平的分布通常是重尾的,即数据包含许多极大值。如果在模型估计和推断过程中不考虑这些离群值,经典的统计方法将产生有偏差的估计和虚假的科学发现。该项目旨在开发可扩展和强大的多元统计方法来解决上述问题。在这个项目中,研究人员使用正则化和统计优化技术的组合来开发新的多元统计方法来分析复杂的高维数据集。该项目的第一部分涉及稀疏广义特征值问题,它自然出现在许多统计模型中,如偏最小二乘,典型相关分析,充分降维和Fisher判别分析。研究人员将开发一个解决稀疏广义特征值问题的一般框架,并提供广泛的统计模型来分析高维数据。此外,研究人员将研究稀疏广义特征值问题的理论性质,这将导致对以前在高维环境中没有很好理解的各种统计模型的理解。第二部分研究了一类鲁棒稀疏降秩回归模型。研究人员将开发有效的算法和高维渐近分析的Huber损失函数下的估计,并量化使用Huber损失和平方误差损失之间的偏差稳健权衡。该研究项目还将提供易于使用的软件包,以适应开发的方法。该奖项反映了NSF的法定使命,并已被认为是值得支持的评估使用基金会的智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Graphical Nonconvex Optimization via an Adaptive Convex Relaxation
- DOI:
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Qiang Sun;Kean Ming Tan;Han Liu;Tong Zhang
- 通讯作者:Qiang Sun;Kean Ming Tan;Han Liu;Tong Zhang
Propagation of Information Along the Cortical Hierarchy as a Function of Attention While Reading and Listening to Stories
- DOI:10.1093/cercor/bhy282
- 发表时间:2019-10-01
- 期刊:
- 影响因子:3.7
- 作者:Regev, Mor;Simony, Erez;Hasson, Uri
- 通讯作者:Hasson, Uri
Sparse generalized eigenvalue problem: optimal statistical rates via truncated Rayleigh flow
- DOI:10.1111/rssb.12291
- 发表时间:2018-11-01
- 期刊:
- 影响因子:5.8
- 作者:Tan, Kean Ming;Wang, Zhaoran;Zhang, Tong
- 通讯作者:Zhang, Tong
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kean Ming Tan其他文献
Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness
协变量调整和可解释的广义因子模型的统计推断及其在测试公平性中的应用
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Ouyang Jing;Chengyu Cui;Kean Ming Tan;Gongjun Xu - 通讯作者:
Gongjun Xu
Selection Bias Correction and Eect Size Estimation under Dependence
依赖性下的选择偏差校正和效应大小估计
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Kean Ming Tan;N. Simon;D. Witten - 通讯作者:
D. Witten
J ul 2 01 9 Robust convex clustering : How does the fusion penalty enhance robustness ?
Jul 2 01 9 鲁棒凸聚类:融合惩罚如何增强鲁棒性?
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Chenyu Liu;Qiang Sun;Kean Ming Tan - 通讯作者:
Kean Ming Tan
Estimation of complier expected shortfall treatment effects with a binary instrumental variable
使用二元工具变量估计编译者预期缺口治疗效果
- DOI:
10.1016/j.jeconom.2023.105572 - 发表时间:
2023 - 期刊:
- 影响因子:6.3
- 作者:
B. Wei;Kean Ming Tan;Xuming He - 通讯作者:
Xuming He
Supplement to “ Smoothed Quantile Regression with Large-Scale Inference ”
“大规模推理的平滑分位数回归”的补充
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Xuming He;Xiaoou Pan;Kean Ming Tan;Wen - 通讯作者:
Wen
Kean Ming Tan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kean Ming Tan', 18)}}的其他基金
CAREER: Super-Quantile Based Methods for Analyzing Large-Scale Heterogenous Data
职业:基于超分位数的大规模异构数据分析方法
- 批准号:
2238428 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Continuing Grant
Collaborative Research: Inference and Decentralized Computing for Quantile Regression and Other Non-Smooth Methods
合作研究:分位数回归和其他非平滑方法的推理和分散计算
- 批准号:
2113346 - 财政年份:2021
- 资助金额:
$ 12万 - 项目类别:
Standard Grant
Statistical Machine Learning Methods for Complex Data Sets
复杂数据集的统计机器学习方法
- 批准号:
1949730 - 财政年份:2019
- 资助金额:
$ 12万 - 项目类别:
Standard Grant
相似国自然基金
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
相似海外基金
Comparison of Machine Learning and Conventional Statistical Modeling for Predicting Readmission Following Acute Heart Failure Hospitalization
机器学习与传统统计模型预测急性心力衰竭住院后再入院的比较
- 批准号:
495410 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Modern Statistics and Statistical Machine Learning
现代统计学和统计机器学习
- 批准号:
2886365 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Studentship
Modern Statistics and Statistical Machine Learning
现代统计学和统计机器学习
- 批准号:
2886852 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Studentship
EAGER: SSMCDAT2023: Revealing Local Symmetry Breaking in Intermetallics: Combining Statistical Mechanics and Machine Learning in PDF Analysis
EAGER:SSMCDAT2023:揭示金属间化合物中的局部对称性破缺:在 PDF 分析中结合统计力学和机器学习
- 批准号:
2334261 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Standard Grant
REU Site: University of North Carolina at Greensboro - Complex Data Analysis using Statistical and Machine Learning Tools
REU 站点:北卡罗来纳大学格林斯伯勒分校 - 使用统计和机器学习工具进行复杂数据分析
- 批准号:
2244160 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Standard Grant
Next-Generation Algorithms in Statistical Genetics Based on Modern Machine Learning
基于现代机器学习的下一代统计遗传学算法
- 批准号:
10714930 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Unravel machine learning blackboxes -- A general, effective and performance-guaranteed statistical framework for complex and irregular inference problems in data science
揭开机器学习黑匣子——针对数据科学中复杂和不规则推理问题的通用、有效和性能有保证的统计框架
- 批准号:
2311064 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Standard Grant
A Novel Approach to Semi-Supervised Statistical Machine Learning
半监督统计机器学习的新方法
- 批准号:
DP230101671 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Discovery Projects
Modern Statistics and Statistical Machine Learning
现代统计学和统计机器学习
- 批准号:
2886723 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Studentship
Modern Statistics and Statistical Machine Learning
现代统计学和统计机器学习
- 批准号:
2886777 - 财政年份:2023
- 资助金额:
$ 12万 - 项目类别:
Studentship