RUI: Predictive models with Incomplete and Fragmented Observations, and New Advances in Virtual Re-sampling for Big Data

RUI:具有不完整和碎片观测的预测模型,以及大数据虚拟重采样的新进展

基本信息

项目摘要

A major focus of this project is on the development of new procedures to carry out statistical modeling, prediction, and inference in the presence of missing data. Incomplete, missing, censored, and partially observed data are prevalent in many areas of medical sciences, engineering, economics and social sciences, which can in turn complicate the task of prediction and inference in data-driven decision-making processes. The investigator will study and explore the effectiveness of several new methods for handling missing values in complex data structures without imposing unrealistic or unnecessarily stringent conditions on the underlying mechanisms that cause the absence of information. Another major aim of this research project is to develop efficient data re-sampling methods to alleviate the formidable computational cost of computer-intensive statistical methods in big-data scenarios, where the data analyst must deal with, and sort through, massive amounts of data. The advent of such efficient methods is timely as the wave of ultra-large datasets has taken over many data-analytic initiatives in medicine, agriculture, and environmental protection. Additionally, this project embraces research experiences for graduate and undergraduate students, many of whom will then be persuaded to move on to further studies and research careers in STEM disciplines.This research project deals with two broad classes of problems related to predictive models and inference. The first part focuses on selected topics in predictive models such as regression and classification for a number of nonstandard realistic setups. Specifically, the investigator will develop several local-averaging-type regression estimators in general metric spaces for incomplete and fractionally observed data with applications to statistical classification and the related problem of unsupervised machine learning. The aim is to carry out a rigorous study of the convergence properties of these estimators in various norms which is necessary for correct prediction and inference. In particular, this project will study and develop new exponential performance bounds for the Lp norms of the proposed estimators. The problem of bandwidth estimation for incomplete and fragmented functional data will also be studied; this is particularly important as the optimal bandwidth minimizing quantities such as the MISE or ISE is not necessarily optimal in classification. The second part of this research plan considers new objectives in virtual re-sampling as a method to reduce the formidable computational cost of big-data bootstrap in a number of important and challenging problems, while still retaining the benefits of bootstrap methodology. In particular, the investigator will develop virtual re-sampling strategies to (i) approximate the distribution of several refined higher criticism statistics for multiple testing problems in big-data scenarios, and (ii) to speed up the logarithmically slow rates of convergence of important functionals of density and regression estimators in two-sample problems such as those based on deconvolution density estimators and their sup-functionals for errors-in-variables models in big-data scenarios. To achieve the objectives under (i) and (ii), the investigator will use adaptations of the methodologies used in the strong approximations of bootstrap empirical processes in the literature.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目的一个主要重点是开发新的程序,在缺失数据的情况下进行统计建模,预测和推断。 不完整、缺失、删失和部分观察的数据在医学、工程、经济和社会科学的许多领域都很普遍,这反过来又会使数据驱动的决策过程中的预测和推理任务复杂化。研究人员将研究和探索几种新方法的有效性,以处理复杂数据结构中的缺失值,而不会对导致信息缺失的潜在机制施加不切实际或不必要的严格条件。该研究项目的另一个主要目标是开发有效的数据重新采样方法,以减轻大数据场景中计算机密集型统计方法的巨大计算成本,其中数据分析师必须处理和整理大量数据。这种有效方法的出现是及时的,因为超大型数据集的浪潮已经接管了医学,农业和环境保护中的许多数据分析计划。此外,该项目还包括研究生和本科生的研究经验,其中许多人将被说服继续在STEM学科进行进一步的学习和研究。该研究项目涉及与预测模型和推理相关的两大类问题。第一部分集中在预测模型中的选定主题,例如一些非标准现实设置的回归和分类。具体来说,研究人员将在一般度量空间中开发几个局部平均型回归估计器,用于不完整和部分观测数据,并应用于统计分类和无监督机器学习的相关问题。我们的目的是进行严格的研究,这些估计的收敛性在各种规范,这是必要的正确的预测和推理。特别是,这个项目将研究和开发新的指数性能界限的Lp范数的建议估计。还将研究不完整和碎片化功能数据的带宽估计问题;这是特别重要的,因为最佳带宽最小化数量,如MISE或ISE不一定是最佳的分类。本研究计划的第二部分考虑虚拟重采样的新目标,作为一种方法,以减少在一些重要和具有挑战性的问题的大数据引导的强大的计算成本,同时仍然保留引导方法的好处。特别是,研究者将开发虚拟重新采样策略,以(i)近似大数据场景中多个测试问题的几个精细化较高批评统计数据的分布,和(ii)在双样本问题中,例如基于反卷积密度估计及其误差的子泛函的问题,大数据场景中的变量模型。 为了实现(i)和(ii)项下的目标,研究者将使用文献中自举经验过程的强近似中使用的方法的改编。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Majid Mojirsheibani其他文献

On the correct regression function (in <em>L</em><sub>2</sub>) and its applications when the dimension of the covariate vector is random
  • DOI:
    10.1016/j.jspi.2012.03.017
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Majid Mojirsheibani
  • 通讯作者:
    Majid Mojirsheibani
On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification
  • DOI:
    10.1007/s10260-016-0359-6
  • 发表时间:
    2016-04-05
  • 期刊:
  • 影响因子:
    0.800
  • 作者:
    Timothy Reese;Majid Mojirsheibani
  • 通讯作者:
    Majid Mojirsheibani
A Note on the Strong Approximation of the Smoothed Empirical Process of α-mixing Sequences

Majid Mojirsheibani的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Majid Mojirsheibani', 18)}}的其他基金

RUI: Partially Observed Curves, and Big-Data Virtual Bootstrap
RUI:部分观察曲线和大数据虚拟引导程序
  • 批准号:
    1916161
  • 财政年份:
    2019
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
RUI: Classification, regression, and density estimation with missing variables
RUI:分类、回归和缺失变量的密度估计
  • 批准号:
    1407400
  • 财政年份:
    2014
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant

相似海外基金

PharmaCrystNet: Improving the Predictive Capabilities of Crystallisation Models in Pharma
PharmaCrystNet:提高制药领域结晶模型的预测能力
  • 批准号:
    EP/Z533014/1
  • 财政年份:
    2024
  • 资助金额:
    $ 20万
  • 项目类别:
    Research Grant
Robust, Trustworthy and Explainable Predictive Models for Low Carbon Power and Energy
稳健、值得信赖且可解释的低碳电力和能源预测模型
  • 批准号:
    2889082
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Studentship
Machine Learning and Multiomics for Predictive Models and Biomarker Discovery in Preterm Infants.
用于早产儿预测模型和生物标志物发现的机器学习和多组学。
  • 批准号:
    10729640
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
Ideas Lab: Collaborative Research: Integrating cross-kingdom lncRNA genetic and functional interactions to build predictive network models
创意实验室:协作研究:整合跨界 lncRNA 遗传和功能相互作用,构建预测网络模型
  • 批准号:
    2243562
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
III: Small: RUI: A Fairness Auditing Framework for Predictive Mobility Models
III:小:RUI:预测移动模型的公平性审核框架
  • 批准号:
    2304213
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
EDGE CMT: deleterious recessive variation - from experimental data to predictive models
EDGE CMT:有害的隐性变异 - 从实验数据到预测模型
  • 批准号:
    10675239
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
Fair risk profiles and predictive models for outcomes of obstructive sleep apnea through electronic medical record data
通过电子病历数据对阻塞性睡眠呼吸暂停结果进行公平的风险概况和预测模型
  • 批准号:
    10678108
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
Predictive multi-scale model of focal adhesion-based durotaxis
基于粘着斑的 durotaxis 的预测多尺度模型
  • 批准号:
    10798520
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
CAREER: Unified Model-agnostic Interpretation Framework for Deep Predictive Models
职业:深度预测模型的与模型无关的统一解释框架
  • 批准号:
    2238700
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Predictive Models of Beryllium Sensitization and Chronic Beryllium Disease
铍致敏和慢性铍病的预测模型
  • 批准号:
    10736862
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了