Statistical Modeling with High-dimensional Data: Variable Selection and Regularization

高维数据统计建模:变量选择和正则化

基本信息

  • 批准号:
    0706724
  • 负责人:
  • 金额:
    $ 10.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2007
  • 资助国家:
    美国
  • 起止时间:
    2007-06-01 至 2010-10-31
  • 项目状态:
    已结题

项目摘要

With high-dimensional data parsimonious models are preferred because they are much more interpretable and at the same time reduce prediction errors. Regularization is also an essential component in most modern developments for data analysis, in particular when the number of predictors is large. Non-regularized fitting is guaranteed to give badly over-fitted and useless models. The investigators take a regularization approach to the variable selection problem in high-dimensional statistical modeling such that the resulting model enjoys excellent prediction accuracy and at the same time has a sparse representation. In particular, the investigators develop: (1) new fused variable selection methods in proteomics data analysis which has been arevolutionary cancer diagnostic tool; (2) a novel kernel logistic regression model which automatically adopts a support-vector representation; (3) several new techniques for performing simultaneous variable selection in estimating multiple quantile regression functions. The investigators also study the theory of these new variable selection techniques. Efficient algorithms and software are developed for public use.Modern scientific innovations allow scientists to collect massive and high-dimensional data. It is critical in scientific investigations to extract useful information from the huge amount of data. For this reason, variable selection and dimension reduction play a fundamental role in high-dimensional statistical modeling. Variable selection problems arise from a wide range of fields, machine learning, drug discovery, biomarker finding, genetics, proteomics, brain imaging analysis, financial modeling, environmental sciences, to name a few. The research project aims to develop state-of-the-art statistical tools that help researchers in various fields to analyze their data.
对于高维数据,简约模型是首选,因为它们更易于解释,同时可以减少预测误差。正则化也是大多数现代数据分析发展的重要组成部分,特别是当预测变量数量很大时。非正则化拟合肯定会产生严重过度拟合和无用的模型。研究人员对高维统计建模中的变量选择问题采用正则化方法,使得所得模型具有出色的预测精度,同时具有稀疏表示。特别是,研究人员开发了:(1)蛋白质组学数据分析中新的融合变量选择方法,该方法已成为革命性的癌症诊断工具; (2)一种新的核逻辑回归模型,自动采用支持向量表示; (3)在估计多分位数回归函数时执行同时变量选择的几种新技术。研究人员还研究了这些新变量选择技术的理论。高效的算法和软件被开发供公众使用。现代科学创新使科学家能够收集海量、高维的数据。从海量数据中提取有用信息对于科学研究至关重要。因此,变量选择和降维在高维统计建模中发挥着基础作用。变量选择问题产生于广泛的领域,例如机器学习、药物发现、生物标志物发现、遗传学、蛋白质组学、脑成像分析、金融建模、环境科学等等。该研究项目旨在开发最先进的统计工具,帮助各个领域的研究人员分析他们的数据。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ming Yuan其他文献

h4 style=font-size:14px;font-family:Arial, Helvetica, sans-serif;background-color:#FFFFFF;Transition-Metal-Free Synthesis of Phenanthridinones from Biaryl-2-oxamic Acid under Radical Conditions/h4
自由基条件下由联芳基-2-草酰胺酸无过渡金属合成菲啶酮
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    5.2
  • 作者:
    Ming Yuan;Li Chen;Junwei Wang;Shenjie Chen;Kongchao Wang;Yongbo Xue;Guangmin Yao;Zengwei Luo;Yonghui Zhang
  • 通讯作者:
    Yonghui Zhang
Breast Cancer Risk Prediction Using Electronic Health Records
使用电子健康记录预测乳腺癌风险
Geochemical distortion on shale oil maturity caused by oil migration: Insights from the non-hydrocarbons revealed by FT-ICR MS
石油运移引起的页岩油成熟度地球化学畸变:FT-ICR MS揭示的非烃洞察
  • DOI:
    10.1016/j.coal.2022.104142
  • 发表时间:
    2022-11
  • 期刊:
  • 影响因子:
    5.6
  • 作者:
    Ming Yuan;Songqi Pan;Zhenhua Jing;Stefanie Poetz;Quan Shi;Yuanjia Han;Caineng Zou
  • 通讯作者:
    Caineng Zou
A Novel Red Electroluminescent Polymers Derived from Carbazole and 4,7-Bis(2-thienyl)-2,1,3-benzothiadiazole,
一种源自咔唑和4,7-双(2-噻吩基)-2,1,3-苯并噻二唑的新型红色电致发光聚合物,
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jian Huang;Yishe Xu;Qiong Hou;Wei Yang;Ming Yuan;Yong Cao
  • 通讯作者:
    Yong Cao
Genome-wide association mapping and candidate gene analysis for water-soluble protein concentration in soybean (Glycine max) based on high-throughput single nucleotide polymorphism markers
基于高通量单核苷酸多态性标记的大豆水溶性蛋白浓度的全基因组关联图谱和候选基因分析
  • DOI:
    10.1071/cp19425
  • 发表时间:
    2020-04
  • 期刊:
  • 影响因子:
    1.9
  • 作者:
    Meinan Sui;Yue Wang;Zhihui Cui;Weili Teng;Ming Yuan;Wenbin Li;Xi Wang;Ruiqiong Li;Yan Lv;Ming Yan;Chao Quan;Xue Zhao;Yingpeng Han
  • 通讯作者:
    Yingpeng Han

Ming Yuan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ming Yuan', 18)}}的其他基金

FRG: Collaborative Research: Dynamic Tensors: Statistical Methods, Theory, and Applications
FRG:协作研究:动态张量:统计方法、理论和应用
  • 批准号:
    2052955
  • 财政年份:
    2021
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Standard Grant
Complexity of High-Dimensional Statistical Models: An Information-Based Approach
高维统计模型的复杂性:基于信息的方法
  • 批准号:
    2015285
  • 财政年份:
    2020
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: Statistical Methods, Algorithms, and Theory for Large Tensors
合作研究:大张量的统计方法、算法和理论
  • 批准号:
    1721584
  • 财政年份:
    2017
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: Statistical Methods, Algorithms, and Theory for Large Tensors
合作研究:大张量的统计方法、算法和理论
  • 批准号:
    1803450
  • 财政年份:
    2017
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant
CAREER: Sparse Modeling and Estimation with High-dimensional Data
职业:高维数据的稀疏建模和估计
  • 批准号:
    1321692
  • 财政年份:
    2013
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant
FRG: Collaborative Research: Statistical Modeling and Inference of Vast Matrices for Complex Problems
FRG:协作研究:复杂问题的庞大矩阵的统计建模和推理
  • 批准号:
    1265202
  • 财政年份:
    2013
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant
CAREER: Sparse Modeling and Estimation with High-dimensional Data
职业:高维数据的稀疏建模和估计
  • 批准号:
    0846234
  • 财政年份:
    2009
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Continuing Grant

相似国自然基金

Galaxy Analytical Modeling Evolution (GAME) and cosmological hydrodynamic simulations.
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目

相似海外基金

Classification of Ankle Osteoarthritis Severity from Weightbearing Computed Tomography Using Statistical Shape Modeling and Machine Learning
使用统计形状建模和机器学习根据负重计算机断层扫描对踝骨关节炎严重程度进行分类
  • 批准号:
    10525301
  • 财政年份:
    2022
  • 资助金额:
    $ 10.2万
  • 项目类别:
Classification of Ankle Osteoarthritis Severity from Weightbearing Computed Tomography Using Statistical Shape Modeling and Machine Learning
使用统计形状建模和机器学习根据负重计算机断层扫描对踝骨关节炎严重程度进行分类
  • 批准号:
    10669281
  • 财政年份:
    2022
  • 资助金额:
    $ 10.2万
  • 项目类别:
Multimodal Integrative Dimension Reduction and Statistical Modeling with Applications to Temporomandibular Joint (TMJ) Morphometry and Biomechanics
多模态综合降维和统计建模及其在颞下颌关节 (TMJ) 形态测量和生物力学中的应用
  • 批准号:
    10196077
  • 财政年份:
    2021
  • 资助金额:
    $ 10.2万
  • 项目类别:
Multimodal Integrative Dimension Reduction and Statistical Modeling with Applications to Temporomandibular Joint (TMJ) Morphometry and Biomechanics
多模态综合降维和统计建模及其在颞下颌关节 (TMJ) 形态测量和生物力学中的应用
  • 批准号:
    10366073
  • 财政年份:
    2021
  • 资助金额:
    $ 10.2万
  • 项目类别:
Development of novel statistical modeling based on functional data analysis for high-dimensional data and its application
基于函数数据分析的高维数据统计模型开发及其应用
  • 批准号:
    20K11707
  • 财政年份:
    2020
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Statistical modeling of long-range chromatin interactions on gene regulation and underlying molecular
长程染色质相互作用对基因调控和潜在分子的统计模型
  • 批准号:
    10172932
  • 财政年份:
    2018
  • 资助金额:
    $ 10.2万
  • 项目类别:
On Statistical Modeling and Parameter Estimation for High Dimensional Systems
高维系统的统计建模和参数估计
  • 批准号:
    1818674
  • 财政年份:
    2017
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Standard Grant
On Statistical Modeling and Parameter Estimation for High Dimensional Systems
高维系统的统计建模和参数估计
  • 批准号:
    1612924
  • 财政年份:
    2016
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Statistical Modeling and Inference for High-dimensional Multi-Subject Neuroimaging Data
合作研究:高维多主体神经影像数据的统计建模和推理
  • 批准号:
    1209118
  • 财政年份:
    2012
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Statistical Modeling and Inference for High-dimensional Multi-Subject Neuroimaging Data
合作研究:高维多主体神经影像数据的统计建模和推理
  • 批准号:
    1208983
  • 财政年份:
    2012
  • 资助金额:
    $ 10.2万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了