权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Repro Sampling Method: A Transformative Artificial-Sample-Based Inferential Framework with Applications to Discrete Parameter, High-Dimensional Data, and Rare Events Inferences

再现采样方法：一种基于人工样本的变革性推理框架，应用于离散参数、高维数据和稀有事件推理

基本信息

批准号：
2015373
负责人：
Minge Xie
金额：
$ 25.86万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-07-01 至 2023-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2015373&HistoricalAwards=false
关键词：
Repro Sampling Method Transformative Artificial

项目摘要

In the era of data science, statistical inference is the cornerstone of extracting useful information from complex data sets. Despite significant progress made in statistics, there remain many challenges in uncertainty quantification in confronting the complex and high-dimensional data. For instance, inherently discrete parameters and model structures are routinely encountered in data science and machine learning problems. For these intrinsically discrete structure problems, conventional statistical inference approaches do not apply. This project aims to develop a new inferential framework addressing the statistical inference questions for those difficult problems in high-dimensional and also rare events data analyses. The development of the framework will be transformative, since it will greatly expand the reach of statistical inference and uncertainty quantification and greatly improve our thinking and approach of making inference for many data science problems. The PIs will actively use the project to recruit and train students, especially underrepresented students, and also integrate the research output into teaching through developing topic courses to senior undergraduate students and graduate students at their home university. The obtained results will be disseminated in journal publications and conferences to enhance the understanding of the results in different communities. R packages for the proposed methods will also be released to the public.The graduate student support will be used on interdisciplinary research and writing codes. Inherently discrete parameters and structures are prevalent in data science, for example, model indices in model selection problems, number of clusters and membership in classifications, number of layers and structure in deep neural network models, connectivity, membership and structure questions in network data, etc. Making inference for discrete parameters and structures is known to be a difficult task. A major challenge is that the large sample central limit theorem (CLT) no longer holds, and a Bayesian analysis is very sensitive and heavily impacted by the prior choice on the discrete model structure. This research project is aimed to develop a novel and general artificial-sample-based inferential framework, termed as, repro sampling. The idea of repro sampling is to create and study the performance of artificial samples that are generated by mimicking the sampling mechanism of the observed data; the artificial samples are then used to help quantify the uncertainty in estimation of model and parameters. The repro-sampling will guarantee the coverage property in finite sample and also can be extended to large sample. The proposed approaches are expected to be broadly applicable, efficient and computationally feasible. The main research goal is to fully develop the novel inferential framework of repro sampling. Three specific topics tailored to important and difficult inferential problems in data science will also be investigated: (A) Model selection and inference in high dimensional regression, nonparametric and deep learning models; (B) Predictive inference for high dimensional regression and data science; (C) Finite sample inference and fusion learning for rare events data. The research work will significantly advance the statistical methodology for the important yet challenging inference problems for discrete parameters, and broaden the applicability of uncertainty quantification to advanced machine learning methods. In addition, the research projects involve real databases and are ideally suited for engaging and training students and new researchers.________________________________________This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在数据科学时代，统计推断是从复杂数据集中提取有用信息的基石。尽管统计学取得了显著的进步，但在面对复杂和高维数据时，不确定性量化仍然存在许多挑战。例如，固有的离散参数和模型结构在数据科学和机器学习问题中经常遇到。对于这些本质上离散的结构问题，传统的统计推断方法不适用。本计画旨在发展一个新的推论架构，以解决高维及稀有事件资料分析中的统计推论难题。该框架的发展将是变革性的，因为它将极大地扩展统计推断和不确定性量化的范围，并极大地改进我们对许多数据科学问题进行推断的思维和方法。研究所将积极利用该项目招募和培训学生，特别是代表性不足的学生，并通过为所在大学的高年级本科生和研究生开发专题课程，将研究成果融入教学。所取得的成果将在期刊出版物和会议上传播，以提高不同社区对成果的理解。所提出的方法的R包也将向公众发布。研究生支持将用于跨学科研究和编写代码。固有的离散参数和结构在数据科学中很普遍，例如，模型选择问题中的模型索引，分类中的聚类数和成员资格，深度神经网络模型中的层数和结构，网络数据中的连接性，成员资格和结构问题等。一个主要的挑战是，大样本中心极限定理（CLT）不再成立，贝叶斯分析是非常敏感的，并严重影响了离散模型结构上的先验选择。本研究旨在发展一种新颖且通用的基于人工样本的推理框架，称为重复采样。重复采样的思想是通过模仿观测数据的采样机制来创建和研究人工样本的性能;然后使用人工样本来帮助量化模型和参数估计中的不确定性。重复抽样既能保证有限样本的覆盖性，又能推广到大样本。所提出的方法是广泛适用的，有效的和计算上可行的。主要研究目标是充分发展新的推理框架的重复采样。针对数据科学中重要和困难的推理问题，还将研究三个特定主题：（A）高维回归，非参数和深度学习模型中的模型选择和推理;（B）高维回归和数据科学的预测推理;（C）稀有事件数据的有限样本推理和融合学习。这项研究工作将大大推进离散参数重要但具有挑战性的推理问题的统计方法，并扩大不确定性量化对先进机器学习方法的适用性。此外，研究项目涉及真实的数据库，非常适合吸引和培训学生和新研究人员。该奖项反映了NSF的法定使命，并被认为是值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估的支持。