CAREER: High-Dimensional Variable Selection in Nonlinear Models and Classification with Correlated Data

职业:非线性模型中的高维变量选择以及相关数据的分类

基本信息

  • 批准号:
    1150318
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-08-01 至 2017-07-31
  • 项目状态:
    已结题

项目摘要

Estimation and prediction with large-scale data sets commonly arise in statistics and related fields and pose great challenges. To address these challenges, four interrelated research topics are proposed for investigation. First, the investigator proposes robust variable selection methods for heavy-tailed data in the ultra-high dimensional setting of dimensionality increasing exponentially with the sample size. To address the heavy-tailedness, regularization methods with robust losses and general penalty functions in various model settings are investigated. The risk properties of these methods are studied and the optimality of penalty function and loss function is characterized. Robust independence screening methods are also proposed and studied. Second, variable selection in high-dimensional functional regression models with functional predictors and/or functional response is investigated. Model fitting procedures are proposed and sampling properties of the proposed methods are thoroughly investigated. Third, the investigator studies the regularization parameter selection in penalized empirical risk minimization in both settings of correctly specified and misspecified models in ultra-high dimensions. The appropriate tradeoff between the model fitting and model complexity is characterized. This study also answers the question on whether conventional model selection criteria such as AIC and BIC continue to work in ultra-high dimensions. Fourth, high-dimensional classification with correlated features is extensively studied under the unified framework of thresholding classification rules, and the optimal choice of threshold that minimizes the classification error is identified. The investigator studies Gaussian classification and generalizes the methods and results to the case of correlated discrete features.Thanks to the advent of modern technologies such as the handwritten digital recognition and single-nucleotide polymorphism (SNP) genotyping experiments, massive data sets with a large number of variables are becoming more and more common in various scientific fields such as computational biology, economics, finance, machine learning, and climatology. How to effectively analyze these data sets poses great challenges in both methodology and computation that are not present in smaller scale studies. A major goal of this proposal is to propose new or extended methodologies and investigate their sampling properties in depth and width for high-dimensional model building and model evaluation in various settings of regression and classification problems. The PI has broad research interests in many fields outside statistics such as computational biology, finance, econometrics, and machine learning. The proposed methods will be tested on real data sets and extended to these different areas. In addition, the PI plans to develop software packages to implement the proposed methods, and make them publicly available. The proposed work will benefit a broad range of scientists and researchers in various fields. The PI also plans to integrate education activities with the proposed research, such as involving minority students, undergraduate students, and graduate students in the proposed projects and incorporating cutting-edge high-dimensional statistical methods into new courses.
大规模数据集的估计和预测是统计学及相关领域的一个常见问题,也是一个巨大的挑战。为了应对这些挑战,提出了四个相互关联的研究课题进行调查。首先,研究者提出了在维数随样本量呈指数增长的超高维环境下重尾数据的稳健变量选择方法。为了解决重尾问题,研究了在各种模型设置下具有鲁棒损失和一般惩罚函数的正则化方法。研究了这些方法的风险性质,刻画了惩罚函数和损失函数的最优性。鲁棒独立筛选方法也被提出和研究。第二,研究了具有功能预测因子和/或功能反应的高维函数回归模型中的变量选择问题。提出了模型拟合程序,并对所提出方法的采样特性进行了深入研究。第三,研究者研究了惩罚经验风险最小化中的正则化参数的选择,在正确指定和错误指定的超高维度模型的设置。适当的模型拟合和模型的复杂性之间的权衡的特点。这项研究还回答了传统的模型选择标准,如AIC和BIC是否继续工作在超高维度的问题。 第四,在阈值分类规则的统一框架下,广泛研究了具有相关特征的高维分类,并确定了使分类误差最小的最佳阈值选择。研究者研究了高斯分类,并将方法和结果推广到相关离散特征的情况。由于手写数字识别和单核苷酸多态性(SNP)基因分型实验等现代技术的出现,具有大量变量的海量数据集在计算生物学,经济学,金融学,机器学习,和气候学。如何有效地分析这些数据集在方法和计算方面都提出了很大的挑战,这在较小规模的研究中是不存在的。该建议的一个主要目标是提出新的或扩展的方法,并调查其采样特性的深度和宽度的高维模型的建设和模型评估在各种设置的回归和分类问题。PI在统计学之外的许多领域都有广泛的研究兴趣,如计算生物学,金融,计量经济学和机器学习。所提出的方法将在真实的数据集上进行测试,并扩展到这些不同的领域。此外,PI计划开发软件包来实施拟议的方法,并将其公开提供。拟议的工作将使各个领域的广泛科学家和研究人员受益。PI还计划将教育活动与拟议的研究相结合,例如让少数民族学生,本科生和研究生参与拟议的项目,并将尖端的高维统计方法纳入新课程。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yingying Fan其他文献

Human Vc9Vd2-T cells efficiently kill influenza virus-infected lung alveolar epithelial cells
人Vc9Vd2-T细胞有效杀死流感病毒感染的肺泡上皮细胞
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    24.1
  • 作者:
    Hong Li;Wenwei Tu;Zheng Xiang;Ting Feng;Jinrong Li;Yingying Fan;Qiao Lu;Zhongwei Yin;Meixing Yu1;Chongyang Shen
  • 通讯作者:
    Chongyang Shen
Effect of ,-Dimethylacrylshikonin on Inhibition of Human Colorectal Cancer Cell Growth in Vitro and in Vivo
效果
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yingying Fan;Shaoju Jin;Jun He;Zhenjun Shao;Jiao Yan;Ting Feng;Hong Li
  • 通讯作者:
    Hong Li
Asymptotic properties of high-dimensional random forests
高维随机森林的渐近性质
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Chien;Patrick Vossler;Yingying Fan;Jinchi Lv
  • 通讯作者:
    Jinchi Lv
Lipid composition and oxidative changes in diabetes and alcoholic diabetes rats
糖尿病和酒精糖尿病大鼠的脂质组成和氧化变化
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lin Qin;Shaik Althaf Hussain;N. Maddu;Chinna Padamala Manjuvani;Venkata Subba Reddy Gangireddygari;Yingying Fan
  • 通讯作者:
    Yingying Fan
Estimation of weak factor models
弱因子模型的估计
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yingying Fan;Jinchi Lv;Mahrad Sharifvaghefi;Yoshimasa Uematsu;Yoshimasa Uematsu;Yoshimasa Uematsu;植松良公;植松良公;植松良公;植松良公;Yoshimasa Uematsu;Yoshimasa Uematsu
  • 通讯作者:
    Yoshimasa Uematsu

Yingying Fan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yingying Fan', 18)}}的其他基金

High-Dimensional Random Forests Learning, Inference, and Beyond
高维随机森林学习、推理及其他
  • 批准号:
    2310981
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
FRG: Collaborative Research: Flexible Network Inference
FRG:协作研究:灵活的网络推理
  • 批准号:
    2052964
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Regularization Methods in High Dimensions with Applications to Functional Data Analysis, Mixed Effects Models and Classification
高维正则化方法及其在函数数据分析、混合效应模型和分类中的应用
  • 批准号:
    0906784
  • 财政年份:
    2009
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

Variable Selection and Prediction for High-Dimensional Genetic Data with Complex Structures
复杂结构高维遗传数据的变量选择与预测
  • 批准号:
    RGPIN-2020-05133
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Latent variable modeling of complex high-dimensional data
复杂高维数据的潜变量建模
  • 批准号:
    RGPIN-2019-05915
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Variable Selection and Prediction for High-Dimensional Genetic Data with Complex Structures
复杂结构高维遗传数据的变量选择与预测
  • 批准号:
    RGPIN-2020-05133
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Latent variable modeling of complex high-dimensional data
复杂高维数据的潜变量建模
  • 批准号:
    RGPIN-2019-05915
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Variable Selection and Prediction for High-Dimensional Genetic Data with Complex Structures
复杂结构高维遗传数据的变量选择与预测
  • 批准号:
    RGPIN-2020-05133
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Bayesian variable selection in high-dimensional latent variable models
高维潜变量模型中的贝叶斯变量选择
  • 批准号:
    2467837
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Studentship
Latent variable modeling of complex high-dimensional data
复杂高维数据的潜变量建模
  • 批准号:
    RGPIN-2019-05915
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Variable Selection and Prediction for High-Dimensional Genetic Data with Complex Structures
复杂结构高维遗传数据的变量选择与预测
  • 批准号:
    DGECR-2020-00344
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Launch Supplement
Latent variable modeling of complex high-dimensional data
复杂高维数据的潜变量建模
  • 批准号:
    DGECR-2019-00345
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Launch Supplement
Non-Convex Landscapes and High-Dimensional Latent Variable Models
非凸景观和高维潜变量模型
  • 批准号:
    1916198
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了