权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RUI: Partially Observed Curves, and Big-Data Virtual Bootstrap

RUI：部分观察曲线和大数据虚拟引导程序

基本信息

批准号：
1916161
负责人：
Majid Mojirsheibani
金额：
$ 17.5万
依托单位：
The University Corporation, Northridge
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-01 至 2023-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1916161&HistoricalAwards=false
关键词：
RUI Partially Observed Curves Big

项目摘要

Many real data sets in scientific disciplines, such as biomedical, engineering, and social sciences, contain missing, censored, or partially observed values, and this can make the task of statistical estimation and inference significantly more complicated. Part of this research project focuses on the development of new flexible statistical methods to perform accurate prediction and inference in the presence of incomplete and missing data. Here, the data could be high-dimensional as well as functional, where each data value can be a curve. In another part of this research project, the PI considers the development of new efficient computer-intensive methods to deal with Big-data scenarios, where the data size may be too large to invoke classical approaches. Big-data has been one of the current research frontiers in recent years and there has been a growing interest in Big-data-driven decision-making procedures in both academia and the industry. There are still many computational and theoretical challenges in this area that require new methodologies. The PI's new approaches will solve a number of important statistical problems at the intersection of machine learning and statistical inference.The research deals with three broad classes of problems related to prediction and inference in some nonstandard setups. These include the problem of functional classification when the covariate curves may be unobservable on some subsets of their domain. However, unlike some of the earlier results in the literature, the PI's approach does not impose any missing-at-random (MAR) type assumptions on the mechanisms that cause the absence or censoring of information. The approach allows for incomplete covariate to appear in the new unclassified curves as well as in the data. Given the observed covariate fragments, the aim is to construct strongly consistent nonparametric classifiers based on local-averaging methods. The second class of problems deals with uniform asymptotics for kernel regression estimators in the presence of missing response variables. This is generally acknowledged to be a difficult problem. The limiting distribution of the maximal deviation of such estimators can be used to construct asymptotically correct uniform confidence bands, or to perform goodness-of-fit tests, for an unknown regression function. Here, the PI will consider both MAR and non-ignorable missing response assumptions. The third set of problems focuses on the development of new weighted bootstrap methods for Big-data scenarios. The PI's approach aims at reducing the computational burden associated with the repeated sampling of Big-data, while still retaining the benefits of bootstrap methodology. The developed methods will be used to better approximate the sampling distribution of kernel and deconvolution density estimators, as well as their important functionals (such as sup- and Lp-norms), in the Big-data scenario.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

科学学科中的许多真实的数据集，如生物医学、工程和社会科学，包含缺失、删失或部分观测值，这可能使统计估计和推断的任务变得更加复杂。该研究项目的一部分侧重于开发新的灵活的统计方法，以便在不完整和缺失数据的情况下进行准确的预测和推断。在这里，数据可以是高维的，也可以是函数式的，其中每个数据值都可以是一条曲线。在该研究项目的另一部分，PI考虑开发新的有效的计算机密集型方法来处理大数据场景，其中数据大小可能太大而无法调用经典方法。近年来，大数据一直是当前的研究前沿之一，学术界和工业界对大数据驱动的决策过程越来越感兴趣。在这一领域仍然存在许多计算和理论挑战，需要新的方法。PI的新方法将解决机器学习和统计推断交叉点上的一些重要统计问题。该研究涉及与某些非标准设置中的预测和推断相关的三大类问题。这些问题包括当协变量曲线在其域的某些子集上可能不可观察时的函数分类问题。然而，与文献中的一些早期结果不同，PI的方法不会对导致信息缺失或删失的机制施加任何随机缺失（MAR）类型的假设。该方法允许不完全协变量出现在新的未分类曲线以及数据中。鉴于所观察到的协变量片段，其目的是构建强一致的非参数分类器的基础上局部平均方法。第二类问题是在响应变量缺失的情况下回归估计的一致渐近性。这是公认的难题。这种估计量的最大偏差的极限分布可以用来构造渐近正确的均匀置信带，或者对未知的回归函数进行拟合优度检验。在此，PI将考虑MAR和不可验证的缺失应答假设。第三组问题的重点是为大数据场景开发新的加权自举方法。PI的方法旨在减少与大数据重复采样相关的计算负担，同时仍然保留bootstrap方法的优点。开发的方法将用于更好地近似核和反卷积密度估计的采样分布，以及它们的重要泛函（如sup和Lp范数），在大数据场景中。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。