CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
基本信息
- 批准号:1752614
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In an era of Big Data, computer-intensive statistical inference faces unprecedented challenges and opportunities. High-dimensional and massive data are now emerging in scientific areas including biomedical engineering, environmental science, financial econometrics, array signal processing, and social networks, among many others. An important associated research challenge is to develop efficient methods to extract information and quantify its uncertainty for a large number of variables and measurements. Data-driven statistical inferential procedures for uncertainty quantification via the bootstrap methods are often computationally intensive for high-dimensional large-scale datasets. On the computational side, this research project will make use of distributed inference via the parallel high-performance computing technique, which is an essential ingredient to speed up bootstrap calculations. On the statistical side, this research will introduce a general framework for studying the performance of various bootstrap methods. This research project aims to lead to a comprehensive understanding of the fundamental tradeoff between statistical and computational concerns in quantifying uncertainty for a broad class of inferential procedures, thus providing guidance to practically optimize statistical accuracy and computational cost in potential real applications. Both undergraduate and graduate students are involved in the project.The overarching goal of this research project is to provide new insights and deepen the theoretical understanding of strengths and fundamental limitations of fully data-dependent inferential procedures (such as bootstraps) in the high-dimensional and massive data framework on two classical problems: i) change point detection and identification; ii) computationally-aware statistical inference for U-statistics. The research aims to develop statistically correct and computationally scalable inferential procedures when the dimension can be larger (or even much larger) than the sample size. In contrast to existing work, the methods under development have strong theoretical guarantees, are robust under mild assumptions, require no tuning, and are easy to parallelize. Of practical interest, the research will develop needed software tools for researchers from disciplines with applications of high-dimensional and nonparametric statistics. Theoretical contributions of the proposed research include establishing new approximation and coupling theorems (under weaker regularity conditions than existing literature) in high-dimensional and infinite-dimensional spaces of increasing dimension and complexity, where classical probability tools such as the central limit theorem and extreme value theory are no longer applicable. The mathematical theory is of independent interest and will provide powerful new tools to analyze other statistical procedures on high-dimensional and nonparametric models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在大数据时代,计算机密集型统计推断面临着前所未有的挑战和机遇。高维和海量数据现在出现在科学领域,包括生物医学工程、环境科学、金融计量经济学、阵列信号处理和社交网络等。一个重要的相关研究的挑战是开发有效的方法来提取信息和量化其不确定性的大量变量和测量。对于高维大规模数据集,通过自举方法进行不确定性量化的数据驱动统计推断过程通常是计算密集型的。在计算方面,本研究项目将通过并行高性能计算技术利用分布式推理,这是加速自举计算的重要组成部分。在统计方面,本研究将介绍一个通用框架,用于研究各种自助方法的性能。该研究项目旨在全面了解统计和计算问题之间的基本权衡,在量化不确定性的广泛的一类推理程序,从而提供指导,以实际优化统计精度和计算成本在潜在的真实的应用。本科生和研究生都参与了这个研究项目。这个研究项目的总体目标是提供新的见解和深化对完全依赖数据的推理过程的优势和基本局限性的理论理解(如Bootstrap)在高维海量数据框架下研究了两个经典问题:i)变点检测与识别; ii)U统计量的计算感知统计推断。该研究的目的是开发统计上正确的和计算上可扩展的推理程序时,尺寸可以大于(甚至更大)的样本量。与现有的工作相比,开发中的方法具有很强的理论保证,在温和的假设下是鲁棒的,不需要调整,并且易于并行化。实际利益,研究将开发所需的软件工具,从学科的研究人员与高维和非参数统计的应用。拟议的研究的理论贡献包括建立新的近似和耦合定理(在较弱的正则性条件下比现有文献)在高维和无限维空间的维数和复杂性增加,经典的概率工具,如中心极限定理和极值理论不再适用。数学理论是独立的兴趣,并将提供强大的新工具,分析其他统计程序的高维和非参数models.This奖项反映了NSF的法定使命,并已被认为是值得通过评估使用基金会的智力价值和更广泛的影响审查标准的支持。
项目成果
期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Mean-field nonparametric estimation of interacting particle systems
- DOI:
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:Rentian Yao;Xiaohui Chen;Yun Yang
- 通讯作者:Rentian Yao;Xiaohui Chen;Yun Yang
Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data
希尔伯特空间中的汉森赖特不等式及其应用于非欧几里得数据的 $K$ 均值聚类
- DOI:10.3150/20-bej1251
- 发表时间:2021
- 期刊:
- 影响因子:1.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations
- DOI:10.1016/j.acha.2020.03.002
- 发表时间:2021-02-19
- 期刊:
- 影响因子:2.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
Cutoff for Exact Recovery of Gaussian Mixture Models
- DOI:10.1109/tit.2021.3063155
- 发表时间:2021-06-01
- 期刊:
- 影响因子:2.5
- 作者:Chen, Xiaohui;Yang, Yun
- 通讯作者:Yang, Yun
Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering
- DOI:
- 发表时间:2022-01
- 期刊:
- 影响因子:0
- 作者:Yubo Zhuang;Xiaohui Chen;Yun Yang
- 通讯作者:Yubo Zhuang;Xiaohui Chen;Yun Yang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Chen其他文献
Nanogel-based scaffolds fabricated for bone regeneration with mesoporous bioactive glass and strontium: In vitro and in vivo characterization
使用介孔生物活性玻璃和锶制造用于骨再生的纳米凝胶支架:体外和体内表征
- DOI:
10.1002/jbm.a.35980 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Qiao Zhang;Xiaohui Chen;Shinan Geng;Lingfei Wei;Richard J Miron;Yanbing Zhao;Yufeng Zhang - 通讯作者:
Yufeng Zhang
Simultaneous determination of eight active components in chloroformextracts from raw and vinegar-processed Genkwa flos usingHPLC-MS and identification of the hepatotoxic ingredients with an HL-7702 cell
HPLC-MS 同时测定生芫花和醋制芫花氯仿提取物中的 8 种活性成分,并用 HL-7702 细胞鉴定肝毒性成分
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:3.1
- 作者:
Yuanyuan Zhang;Ruowen Zhang;Yang Yuan;Lulu Geng;Xu Zhao;Xia Meng;Hefei Zhuang;Kaishun Bi;Xiaohui Chen - 通讯作者:
Xiaohui Chen
Green synthesis of submicron-sized Ti-rich MWW zeolite powders via a novel mechanochemical dry gel conversion in mixed steam environment.
通过混合蒸汽环境中新型机械化学干凝胶转化绿色合成亚微米级富钛MWW沸石粉末。
- DOI:
10.1016/j.apt.2020.02.037 - 发表时间:
- 期刊:
- 影响因子:5.2
- 作者:
Mingyi Zhang;Zenan Lin;Qingming Huang;Xiaohui Chen - 通讯作者:
Xiaohui Chen
Pore Size Estimation of Mesoporous Materials through Surfactant Self-Assembly Prediction
通过表面活性剂自组装预测介孔材料的孔径估算
- DOI:
10.1515/zpch-2014-0534 - 发表时间:
2014-08 - 期刊:
- 影响因子:0
- 作者:
Chengzhi Xu;Xiaohui Chen;Kemei Wei - 通讯作者:
Kemei Wei
Synergetic effect and mechanism between propylene carbonate and polymer rich in ester and ether groups for CO2 physical absorption
碳酸丙烯酯与富含酯醚基聚合物对CO2物理吸收的协同效应及机理
- DOI:
10.1016/j.jclepro.2022.130389 - 发表时间:
2022 - 期刊:
- 影响因子:11.1
- 作者:
Yun Li;Meng Dai;Xiaohui Chen;Yongpeng Yang;Mo Yang;Weijia Huang;Ping Cheng - 通讯作者:
Ping Cheng
Xiaohui Chen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Chen', 18)}}的其他基金
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Developing an MND oral health care pathway and a dynamic toolkit
开发 MND 口腔保健途径和动态工具包
- 批准号:
ES/Y008200/1 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Research Grant
Collaborative Research: Second Order Inference for High-Dimensional Time Series and Its Applications
合作研究:高维时间序列的二阶推理及其应用
- 批准号:
1404891 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似国自然基金
基于多重计算全息片(Computer-generated Hologram,CGH)的光学非球面干涉绝对检验方法研究
- 批准号:62375132
- 批准年份:2023
- 资助金额:54.00 万元
- 项目类别:面上项目
Journal of Computer Science and Technology
- 批准号:61224001
- 批准年份:2012
- 资助金额:20.0 万元
- 项目类别:专项基金项目
Journal of Computer Science and Technology
- 批准号:61040017
- 批准年份:2010
- 资助金额:4.0 万元
- 项目类别:专项基金项目
相似海外基金
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
Computer Intensive Methods in Sampling and in Adaptive Contexts
采样和自适应环境中的计算机密集型方法
- 批准号:
RGPIN-2016-05686 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Impact of workstation and wearable technologies on musculoskeletal disorder risk in computer-intensive sedentary environments
工作站和可穿戴技术对计算机密集型久坐环境中肌肉骨骼疾病风险的影响
- 批准号:
RGPIN-2020-05591 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Impact of workstation and wearable technologies on musculoskeletal disorder risk in computer-intensive sedentary environments
工作站和可穿戴技术对计算机密集型久坐环境中肌肉骨骼疾病风险的影响
- 批准号:
RGPIN-2020-05591 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Impact of workstation and wearable technologies on musculoskeletal disorder risk in computer-intensive sedentary environments
工作站和可穿戴技术对计算机密集型久坐环境中肌肉骨骼疾病风险的影响
- 批准号:
RGPIN-2020-05591 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
Computer-intensive Inference with Applications to Social Sciences
计算机密集型推理及其在社会科学中的应用
- 批准号:
1949845 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Computer-Intensive Methods for Nonparametric Analysis of Dependent Data
相关数据非参数分析的计算机密集型方法
- 批准号:
1914556 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Computer Intensive Methods in Sampling and in Adaptive Contexts
采样和自适应环境中的计算机密集型方法
- 批准号:
RGPIN-2016-05686 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual
BAsIC - Brain-computer-interface practical Application in the Intensive Care unit: a pilot study
基础 - 脑机接口在重症监护室的实际应用:试点研究
- 批准号:
9808271 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
Computer Intensive Methods in Sampling and in Adaptive Contexts
采样和自适应环境中的计算机密集型方法
- 批准号:
RGPIN-2016-05686 - 财政年份:2018
- 资助金额:
$ 40万 - 项目类别:
Discovery Grants Program - Individual