Repro Sampling Method: A Transformative Artificial-Sample-Based Inferential Framework with Applications to Discrete Parameter, High-Dimensional Data, and Rare Events Inferences
再现采样方法:一种基于人工样本的变革性推理框架,应用于离散参数、高维数据和稀有事件推理
基本信息
- 批准号:2015373
- 负责人:
- 金额:$ 25.86万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-07-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the era of data science, statistical inference is the cornerstone of extracting useful information from complex data sets. Despite significant progress made in statistics, there remain many challenges in uncertainty quantification in confronting the complex and high-dimensional data. For instance, inherently discrete parameters and model structures are routinely encountered in data science and machine learning problems. For these intrinsically discrete structure problems, conventional statistical inference approaches do not apply. This project aims to develop a new inferential framework addressing the statistical inference questions for those difficult problems in high-dimensional and also rare events data analyses. The development of the framework will be transformative, since it will greatly expand the reach of statistical inference and uncertainty quantification and greatly improve our thinking and approach of making inference for many data science problems. The PIs will actively use the project to recruit and train students, especially underrepresented students, and also integrate the research output into teaching through developing topic courses to senior undergraduate students and graduate students at their home university. The obtained results will be disseminated in journal publications and conferences to enhance the understanding of the results in different communities. R packages for the proposed methods will also be released to the public.The graduate student support will be used on interdisciplinary research and writing codes. Inherently discrete parameters and structures are prevalent in data science, for example, model indices in model selection problems, number of clusters and membership in classifications, number of layers and structure in deep neural network models, connectivity, membership and structure questions in network data, etc. Making inference for discrete parameters and structures is known to be a difficult task. A major challenge is that the large sample central limit theorem (CLT) no longer holds, and a Bayesian analysis is very sensitive and heavily impacted by the prior choice on the discrete model structure. This research project is aimed to develop a novel and general artificial-sample-based inferential framework, termed as, repro sampling. The idea of repro sampling is to create and study the performance of artificial samples that are generated by mimicking the sampling mechanism of the observed data; the artificial samples are then used to help quantify the uncertainty in estimation of model and parameters. The repro-sampling will guarantee the coverage property in finite sample and also can be extended to large sample. The proposed approaches are expected to be broadly applicable, efficient and computationally feasible. The main research goal is to fully develop the novel inferential framework of repro sampling. Three specific topics tailored to important and difficult inferential problems in data science will also be investigated: (A) Model selection and inference in high dimensional regression, nonparametric and deep learning models; (B) Predictive inference for high dimensional regression and data science; (C) Finite sample inference and fusion learning for rare events data. The research work will significantly advance the statistical methodology for the important yet challenging inference problems for discrete parameters, and broaden the applicability of uncertainty quantification to advanced machine learning methods. In addition, the research projects involve real databases and are ideally suited for engaging and training students and new researchers.________________________________________This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在数据科学时代,统计推断是从复杂数据集中提取有用信息的基石。尽管统计学取得了显著的进步,但在面对复杂和高维数据时,不确定性量化仍然存在许多挑战。例如,固有的离散参数和模型结构在数据科学和机器学习问题中经常遇到。对于这些本质上离散的结构问题,传统的统计推断方法不适用。本计画旨在发展一个新的推论架构,以解决高维及稀有事件资料分析中的统计推论难题。该框架的发展将是变革性的,因为它将极大地扩展统计推断和不确定性量化的范围,并极大地改进我们对许多数据科学问题进行推断的思维和方法。研究所将积极利用该项目招募和培训学生,特别是代表性不足的学生,并通过为所在大学的高年级本科生和研究生开发专题课程,将研究成果融入教学。所取得的成果将在期刊出版物和会议上传播,以提高不同社区对成果的理解。所提出的方法的R包也将向公众发布。研究生支持将用于跨学科研究和编写代码。固有的离散参数和结构在数据科学中很普遍,例如,模型选择问题中的模型索引,分类中的聚类数和成员资格,深度神经网络模型中的层数和结构,网络数据中的连接性,成员资格和结构问题等。一个主要的挑战是,大样本中心极限定理(CLT)不再成立,贝叶斯分析是非常敏感的,并严重影响了离散模型结构上的先验选择。本研究旨在发展一种新颖且通用的基于人工样本的推理框架,称为重复采样。重复采样的思想是通过模仿观测数据的采样机制来创建和研究人工样本的性能;然后使用人工样本来帮助量化模型和参数估计中的不确定性。重复抽样既能保证有限样本的覆盖性,又能推广到大样本。所提出的方法是广泛适用的,有效的和计算上可行的。主要研究目标是充分发展新的推理框架的重复采样。针对数据科学中重要和困难的推理问题,还将研究三个特定主题:(A)高维回归,非参数和深度学习模型中的模型选择和推理;(B)高维回归和数据科学的预测推理;(C)稀有事件数据的有限样本推理和融合学习。这项研究工作将大大推进离散参数重要但具有挑战性的推理问题的统计方法,并扩大不确定性量化对先进机器学习方法的适用性。 此外,研究项目涉及真实的数据库,非常适合吸引和培训学生和新研究人员。该奖项反映了NSF的法定使命,并被认为是值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估的支持。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Discussion of Professor Bradley Efron’s Article on “Prediction, Estimation, and Attribution”
Bradley Efron 教授关于“预测、估计和归因”的文章的讨论
- DOI:10.1111/insr.12415
- 发表时间:2020
- 期刊:
- 影响因子:2
- 作者:Xie, Min‐ge;Zheng, Zheshi
- 通讯作者:Zheng, Zheshi
Causal inference with invalid instruments: post-selection problems and a solution using searching and sampling
使用无效仪器进行因果推断:选择后问题以及使用搜索和采样的解决方案
- DOI:10.1093/jrsssb/qkad049
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Guo, Zijian
- 通讯作者:Guo, Zijian
Nonparametric Fusion Learning for Multiparameters: Synthesize Inferences From Diverse Sources Using Data Depth and Confidence Distribution
多参数的非参数融合学习:使用数据深度和置信分布从不同来源综合推论
- DOI:10.1080/01621459.2021.1902817
- 发表时间:2021
- 期刊:
- 影响因子:3.7
- 作者:Liu, Dungang;Liu, Regina Y.;Xie, Min-ge
- 通讯作者:Xie, Min-ge
Individualized Group Learning
- DOI:10.1080/01621459.2021.1947306
- 发表时间:2019-06
- 期刊:
- 影响因子:3.7
- 作者:Chencheng Cai;Rong Chen;Min‐ge Xie
- 通讯作者:Chencheng Cai;Rong Chen;Min‐ge Xie
Leveraging the Fisher Randomization Test using Confidence Distributions: Inference, Combination and Fusion Learning
利用置信分布的 Fisher 随机化检验:推理、组合和融合学习
- DOI:10.1111/rssb.12429
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Luo, Xiaokang;Dasgupta, Tirthankar;Xie, Minge;Liu, Regina Y.
- 通讯作者:Liu, Regina Y.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Minge Xie其他文献
Additive effects among uterine paracrine factors in promoting bovine trophoblast cell proliferation
子宫旁分泌因子促进牛滋养层细胞增殖的叠加作用
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Minge Xie - 通讯作者:
Minge Xie
Impact of measurement error on container inspection policies at port-of-entry
- DOI:
10.1007/s10479-010-0681-6 - 发表时间:
2010-01-27 - 期刊:
- 影响因子:4.500
- 作者:
Yada Zhu;Mingyu Li;Christina M. Young;Minge Xie;Elsayed A. Elsayed - 通讯作者:
Elsayed A. Elsayed
Utility of the Activity Measure for Post-Acute Care (AM-PAC) as a Measure of Functional Recovery Across the TBI Rehabilitation Continuum
急性后期照护活动量表(AM - PAC)在创伤性脑损伤康复连续过程中作为功能恢复衡量指标的效用
- DOI:
10.1016/j.apmr.2025.01.371 - 发表时间:
2025-04-01 - 期刊:
- 影响因子:3.700
- 作者:
Monique Tremaine;Hayk Petrosyan;Minge Xie;Onrina Chandra;Shelby Hinchman - 通讯作者:
Shelby Hinchman
Minge Xie的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Minge Xie', 18)}}的其他基金
Unravel machine learning blackboxes -- A general, effective and performance-guaranteed statistical framework for complex and irregular inference problems in data science
揭开机器学习黑匣子——针对数据科学中复杂和不规则推理问题的通用、有效和性能有保证的统计框架
- 批准号:
2311064 - 财政年份:2023
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
ATD: Anomaly Detection with Confidence and Precision
ATD:充满信心且精确的异常检测
- 批准号:
2027855 - 财政年份:2020
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
Confidence Distribution (CD) and Efficient Approaches for Combining Inferences from Massive Complex Data
置信分布 (CD) 和结合海量复杂数据推论的有效方法
- 批准号:
1513483 - 财政年份:2015
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
Conference on Advanced Statistical Methods for Underground Seismic Event Monitoring and Verification
地下地震事件监测与验证先进统计方法会议
- 批准号:
1309312 - 财政年份:2013
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
New Developments on Confidence Distributions (CDs) and Statistical Inference: Theory, Methodology and Applications
置信分布(CD)和统计推断的新进展:理论、方法和应用
- 批准号:
1107012 - 财政年份:2011
- 资助金额:
$ 25.86万 - 项目类别:
Continuing Grant
An Effective Methodology for Combining Information from Independent Sources with Applications to Social and Behavioral Sciences and Medical Research
将独立来源的信息与社会和行为科学以及医学研究的应用相结合的有效方法
- 批准号:
0851521 - 财政年份:2009
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
ATD: Statistical Methods for Nuclear Material Surveillance Using Mobile Sensors
ATD:使用移动传感器进行核材料监测的统计方法
- 批准号:
0915139 - 财政年份:2009
- 资助金额:
$ 25.86万 - 项目类别:
Continuing Grant
New Developments in Longitudinal and Heterogeneous Data Analysis with Applications to the Social and Behavioral Sciences
纵向和异构数据分析的新进展及其在社会和行为科学中的应用
- 批准号:
0241859 - 财政年份:2003
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
Messy Data Modeling and Related Topics
凌乱数据建模及相关主题
- 批准号:
9803273 - 财政年份:1998
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
相似海外基金
Development of a parent support program for preschool children using the experience sampling method
使用经验抽样法制定学龄前儿童家长支持计划
- 批准号:
21K13556 - 财政年份:2021
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Development of automatic blood sampling method using non-contact estimation of blood vessel depth and force measurement
开发使用非接触式估计血管深度和力测量的自动采血方法
- 批准号:
20K12686 - 财政年份:2020
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Studies on catching performance of newston net for standardization of microplastics sampling method
纽斯顿网捕集性能研究,用于标准化微塑料采样方法
- 批准号:
20H03060 - 财政年份:2020
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Auto-Tuning Technique for Motor Drives by Synchronous Minor Sampling Method
同步小采样法电机驱动自整定技术
- 批准号:
20K04437 - 财政年份:2020
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
A novel interpolation method for non-uniformed sampling data of nuclear magnetic resonance spectroscopy based on graph signal processing
基于图信号处理的核磁共振波谱非均匀采样数据插值新方法
- 批准号:
20K23331 - 财政年份:2020
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Development of wavelet-based conditional sampling and averaging method and the application to the noise source identification of supersonic jet broadband noise
基于小波的条件采样平均方法的发展及其在超音速喷气宽带噪声噪声源识别中的应用
- 批准号:
18H01621 - 财政年份:2018
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Study on the development of a method for rapidly sampling and analyzing PCBs in a room or working environment using yarns and risk evaluation
研究开发使用纱线在房间或工作环境中快速采样和分析 PCB 的方法及风险评估
- 批准号:
18K11686 - 财政年份:2018
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of simultaneous determination method of kinase inhibitors by means of dried blood spot sampling at home
家庭干血斑采样同时测定激酶抑制剂方法的建立
- 批准号:
18K14988 - 财政年份:2018
- 资助金额:
$ 25.86万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
EAGER: A New Method for Sampling Sea-Salt Aerosols
EAGER:海盐气溶胶采样的新方法
- 批准号:
1762166 - 财政年份:2017
- 资助金额:
$ 25.86万 - 项目类别:
Standard Grant
Targeting drug resistance: Identifying allosteric binding sites, exploiting conserved allosteric networks with enhanced sampling computational method
针对耐药性:识别变构结合位点,通过增强采样计算方法利用保守的变构网络
- 批准号:
2326434 - 财政年份:2017
- 资助金额:
$ 25.86万 - 项目类别:
Studentship