RUI: Partially Observed Curves, and Big-Data Virtual Bootstrap

RUI:部分观察曲线和大数据虚拟引导程序

基本信息

项目摘要

Many real data sets in scientific disciplines, such as biomedical, engineering, and social sciences, contain missing, censored, or partially observed values, and this can make the task of statistical estimation and inference significantly more complicated. Part of this research project focuses on the development of new flexible statistical methods to perform accurate prediction and inference in the presence of incomplete and missing data. Here, the data could be high-dimensional as well as functional, where each data value can be a curve. In another part of this research project, the PI considers the development of new efficient computer-intensive methods to deal with Big-data scenarios, where the data size may be too large to invoke classical approaches. Big-data has been one of the current research frontiers in recent years and there has been a growing interest in Big-data-driven decision-making procedures in both academia and the industry. There are still many computational and theoretical challenges in this area that require new methodologies. The PI's new approaches will solve a number of important statistical problems at the intersection of machine learning and statistical inference.The research deals with three broad classes of problems related to prediction and inference in some nonstandard setups. These include the problem of functional classification when the covariate curves may be unobservable on some subsets of their domain. However, unlike some of the earlier results in the literature, the PI's approach does not impose any missing-at-random (MAR) type assumptions on the mechanisms that cause the absence or censoring of information. The approach allows for incomplete covariate to appear in the new unclassified curves as well as in the data. Given the observed covariate fragments, the aim is to construct strongly consistent nonparametric classifiers based on local-averaging methods. The second class of problems deals with uniform asymptotics for kernel regression estimators in the presence of missing response variables. This is generally acknowledged to be a difficult problem. The limiting distribution of the maximal deviation of such estimators can be used to construct asymptotically correct uniform confidence bands, or to perform goodness-of-fit tests, for an unknown regression function. Here, the PI will consider both MAR and non-ignorable missing response assumptions. The third set of problems focuses on the development of new weighted bootstrap methods for Big-data scenarios. The PI's approach aims at reducing the computational burden associated with the repeated sampling of Big-data, while still retaining the benefits of bootstrap methodology. The developed methods will be used to better approximate the sampling distribution of kernel and deconvolution density estimators, as well as their important functionals (such as sup- and Lp-norms), in the Big-data scenario.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学学科中的许多真实的数据集,如生物医学、工程和社会科学,包含缺失、删失或部分观测值,这可能使统计估计和推断的任务变得更加复杂。 该研究项目的一部分侧重于开发新的灵活的统计方法,以便在不完整和缺失数据的情况下进行准确的预测和推断。在这里,数据可以是高维的,也可以是函数式的,其中每个数据值都可以是一条曲线。在该研究项目的另一部分,PI考虑开发新的有效的计算机密集型方法来处理大数据场景,其中数据大小可能太大而无法调用经典方法。近年来,大数据一直是当前的研究前沿之一,学术界和工业界对大数据驱动的决策过程越来越感兴趣。 在这一领域仍然存在许多计算和理论挑战,需要新的方法。PI的新方法将解决机器学习和统计推断交叉点上的一些重要统计问题。该研究涉及与某些非标准设置中的预测和推断相关的三大类问题。这些问题包括当协变量曲线在其域的某些子集上可能不可观察时的函数分类问题。然而,与文献中的一些早期结果不同,PI的方法不会对导致信息缺失或删失的机制施加任何随机缺失(MAR)类型的假设。该方法允许不完全协变量出现在新的未分类曲线以及数据中。鉴于所观察到的协变量片段,其目的是构建强一致的非参数分类器的基础上局部平均方法。第二类问题是在响应变量缺失的情况下回归估计的一致渐近性。这是公认的难题。这种估计量的最大偏差的极限分布可以用来构造渐近正确的均匀置信带,或者对未知的回归函数进行拟合优度检验。在此,PI将考虑MAR和不可验证的缺失应答假设。第三组问题的重点是为大数据场景开发新的加权自举方法。PI的方法旨在减少与大数据重复采样相关的计算负担,同时仍然保留bootstrap方法的优点。开发的方法将用于更好地近似核和反卷积密度估计的采样分布,以及它们的重要泛函(如sup和Lp范数),在大数据场景中。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
On histogram-based regression and classification with incomplete data
  • DOI:
    10.1007/s00184-020-00794-y
  • 发表时间:
    2020-08-19
  • 期刊:
  • 影响因子:
    0.7
  • 作者:
    Han,Eric;Mojirsheibani,Majid
  • 通讯作者:
    Mojirsheibani,Majid
A nearest-neighbor-based ensemble classifier and its large-sample optimality
基于最近邻的集成分类器及其大样本最优性
On classification with nonignorable missing data
  • DOI:
    10.1016/j.jmva.2021.104755
  • 发表时间:
    2021-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Mojirsheibani
  • 通讯作者:
    M. Mojirsheibani
On regression and classification with possibly missing response variables in the data
  • DOI:
    10.1007/s00184-023-00923-3
  • 发表时间:
    2022-12
  • 期刊:
  • 影响因子:
    0.7
  • 作者:
    M. Mojirsheibani;William Pouliot;Andre Shakhbandaryan
  • 通讯作者:
    M. Mojirsheibani;William Pouliot;Andre Shakhbandaryan
A note on the performance of bootstrap kernel density estimation with small re-sample sizes
关于小重新样本大小的引导核密度估计性能的说明
  • DOI:
    10.1016/j.spl.2021.109189
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0.8
  • 作者:
    Mojirsheibani, Majid
  • 通讯作者:
    Mojirsheibani, Majid
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Majid Mojirsheibani其他文献

On the correct regression function (in <em>L</em><sub>2</sub>) and its applications when the dimension of the covariate vector is random
  • DOI:
    10.1016/j.jspi.2012.03.017
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Majid Mojirsheibani
  • 通讯作者:
    Majid Mojirsheibani
On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification
  • DOI:
    10.1007/s10260-016-0359-6
  • 发表时间:
    2016-04-05
  • 期刊:
  • 影响因子:
    0.800
  • 作者:
    Timothy Reese;Majid Mojirsheibani
  • 通讯作者:
    Majid Mojirsheibani
A Note on the Strong Approximation of the Smoothed Empirical Process of α-mixing Sequences

Majid Mojirsheibani的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Majid Mojirsheibani', 18)}}的其他基金

RUI: Predictive models with Incomplete and Fragmented Observations, and New Advances in Virtual Re-sampling for Big Data
RUI:具有不完整和碎片观测的预测模型,以及大数据虚拟重采样的新进展
  • 批准号:
    2310504
  • 财政年份:
    2023
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
RUI: Classification, regression, and density estimation with missing variables
RUI:分类、回归和缺失变量的密度估计
  • 批准号:
    1407400
  • 财政年份:
    2014
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于分数阶衍射的PT及Partially-PT对称非线性系统中的空间孤子研究
  • 批准号:
    11764022
  • 批准年份:
    2017
  • 资助金额:
    33.0 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Inference and computational methods for regression models in the presence of partially observed network data or high-dimensional capture-recapture data
存在部分观察到的网络数据或高维捕获-重捕获数据的回归模型的推理和计算方法
  • 批准号:
    RGPIN-2022-03309
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
Inference and computational methods for regression models in the presence of partially observed network data or high-dimensional capture-recapture data
存在部分观察到的网络数据或高维捕获-重捕获数据的回归模型的推理和计算方法
  • 批准号:
    DGECR-2022-00441
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Launch Supplement
Advancing statistical inference for correlated and partially observed data
推进相关数据和部分观察数据的统计推断
  • 批准号:
    RGPIN-2018-05447
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
Partially Observed Systems in Finance: Statistical Inference and Optimization
金融中的部分观测系统:统计推断和优化
  • 批准号:
    2205751
  • 财政年份:
    2022
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Standard Grant
Advancing statistical inference for correlated and partially observed data
推进相关数据和部分观察数据的统计推断
  • 批准号:
    RGPIN-2018-05447
  • 财政年份:
    2021
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
Graphical Models from Partially Observed Interactions with Biomedical Applications
部分观察到的与生物医学应用相互作用的图形模型
  • 批准号:
    10452586
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
Advancing statistical inference for correlated and partially observed data
推进相关数据和部分观察数据的统计推断
  • 批准号:
    RGPIN-2018-05447
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
Graphical Models from Partially Observed Interactions with Biomedical Applications
部分观察到的与生物医学应用相互作用的图形模型
  • 批准号:
    10215569
  • 财政年份:
    2020
  • 资助金额:
    $ 17.5万
  • 项目类别:
Advancing statistical inference for correlated and partially observed data
推进相关数据和部分观察数据的统计推断
  • 批准号:
    RGPIN-2018-05447
  • 财政年份:
    2019
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
Advancing statistical inference for correlated and partially observed data
推进相关数据和部分观察数据的统计推断
  • 批准号:
    RGPIN-2018-05447
  • 财政年份:
    2018
  • 资助金额:
    $ 17.5万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了