权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

High-Dimensional Inference beyond Linear Models

超越线性模型的高维推理

基本信息

批准号：
1915711
负责人：
Bin Nan
金额：
$ 20万
依托单位：
University of California-Irvine
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1915711&HistoricalAwards=false
关键词：
Dimensional Inference beyond Linear Models

项目摘要

Regression models are widely used in investigating the associations between a set of predicting variables, the so-called covariates, and some outcome variable. Estimates of regression coefficients and their confidence intervals provide useful information, for example, the importance of certain genetic variants to lung cancer, or brain regions associated with memory loss in an aging population. With the advent of big data era, regression models with many covariates have been commonly used to tackle many important scientific problems in areas such as genomics, neuroimaging, business, engineering, information technology, and other biomedical studies, and sometimes the number of covariates (e.g. genetic variants) is even greater than the sample size (e.g. the number of study participants). Making statistical inference (i.e. constructing confidence intervals for regression coefficients) for a large number of covariates becomes a challenging issue because the conventional methods such as the maximum likelihood estimation may either not exist or yield biased estimates. It has been shown in recent years that the regression coefficients can be estimated by using regularized methods, e.g., the lasso approach. However, it is also well-known that the regularized methods yield biased estimates, thus cannot be directly used for making statistical inference, in particular, for constructing confidence intervals. Some researchers have shown that proper statistical inference can be made in linear regression models after implementing a clever de-biasing procedure. However, it is also found that the de-biased method does not work beyond linear models. Without imposing restrictive assumptions, theory and methods will be developed for the generalized linear models and the Cox regression model with a large number of covariates, as well as for the functional regression models with applications in brain imaging studies. Proper distributional theory and confidence intervals will be provided, which will lead to more reliable results in scientific research. The existing de-biased methods do not successfully correct the bias in nonlinear models, e.g., the generalized linear models or the Cox model, leading to poor results in statistical inference. The main causes of the problem include the unrealistic sparsity assumption imposed on the inverse expected Hessian matrix, and that the "negligible" terms in the existing de-biased methods are in fact not negligible. In this project, two methods that further de-bias the lasso estimators without relying on the assumption of sparse inverse expected Hessian matrix will be considered: (i) directly inverting the Hessian matrix when the number of regression parameters is less than the sample size; (ii) eliminating the major bias term without using the inverse of Hessian matrix - a quadratic programming approach, which can potentially handle the case with larger number of regression parameters than the number of observations. Additional challenges arise in the Cox regression with high-dimensional covariates, where the partial-likelihood-based loss functions for all the observations are not i.i.d., and each loss function is not Lipschitz. The proposed method will be approximating the loss function to yield i.i.d. losses and extended to handling multivariate and clustered survival data with even more complicated loss functions. For the brain imaging data, functional regression model using Haar wavelet basis is investigated. The major added challenge is to characterize the impact of the approximation error using Haar wavelets on the asymptotic distribution of the refined de-biased functional estimation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

回归模型被广泛用于研究一组预测变量（所谓的协变量）与某些结果变量之间的关联。回归系数及其置信区间的估计值提供了有用的信息，例如，某些遗传变异对肺癌的重要性，或与老龄化人群中记忆丧失相关的大脑区域。随着大数据时代的到来，具有许多协变量的回归模型已被普遍用于解决基因组学、神经成像、商业、工程、信息技术和其他生物医学研究等领域的许多重要科学问题，有时协变量（例如遗传变异）的数量甚至大于样本量（例如研究参与者的数量）。对大量协变量进行统计推断（即构建回归系数的置信区间）成为一个具有挑战性的问题，因为传统方法（如最大似然估计）可能不存在或产生有偏估计。近年来已经表明，回归系数可以通过使用正则化方法来估计，例如，套索方法然而，众所周知，正则化方法产生有偏估计，因此不能直接用于进行统计推断，特别是用于构造置信区间。一些研究人员已经证明，在实施了一个聪明的去偏过程后，可以在线性回归模型中进行适当的统计推断。然而，它也被发现，去偏置的方法不工作超出线性模型。在不施加限制性假设的情况下，将为具有大量协变量的广义线性模型和考克斯回归模型以及在脑成像研究中应用的函数回归模型开发理论和方法。将提供适当的分布理论和置信区间，这将导致在科学研究中更可靠的结果。现有的去偏方法不能成功地校正非线性模型中的偏置，例如，广义线性模型或考克斯模型，导致统计推断结果不佳。该问题的主要原因包括对逆期望Hessian矩阵的不切实际的稀疏性假设，以及现有去偏方法中的“可忽略”项实际上是不可忽略的。在本项目中，我们将考虑两种不依赖于稀疏逆期望Hessian矩阵假设的进一步去偏Lasso估计的方法：（i）当回归参数个数小于样本容量时，直接对Hessian矩阵求逆;（ii）在不使用Hessian矩阵的逆的情况下消除主要偏差项-二次规划方法，其可以潜在地处理回归参数的数量大于观测的数量的情况。额外的挑战出现在具有高维协变量的考克斯回归中，其中所有观测的基于部分似然的损失函数不是i.i.d.，每个损失函数都不是Lipschitz所提出的方法将近似损失函数以产生i.i.d.。损失，并扩展到处理多变量和集群生存数据，甚至更复杂的损失函数。针对脑成像数据，研究了基于Haar小波基的函数回归模型。增加的主要挑战是使用Haar小波的近似误差的渐近分布上的细化de-biased功能estimation.This奖项反映了NSF的法定使命，并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Structured Brain-wide and Genome-wide Association Study Using ADNI PET Images.

DOI：
10.1002/cjs.11605
发表时间：
2021-03
期刊：
The Canadian journal of statistics = Revue canadienne de statistique
影响因子：
0
作者：
Li Y;Nan B;Zhu J;Alzheimer’s Disease Neuroimaging Initiative
通讯作者：
Alzheimer’s Disease Neuroimaging Initiative

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Bin Nan其他文献

A Robust Error-Resistant View Selection Method for 3D Reconstruction

一种鲁棒、抗错的 3D 重建视图选择方法

DOI：
发表时间：
2024
期刊：
arXiv.org
影响因子：
0
作者：
Shaojie Zhang;Yinghui Wang;Bin Nan;Wei Li;Jinlong Yang;Tao Yan;Liangyi Huang;Mingfeng Wang;Ibragim R. Atadjanov
通讯作者：
Ibragim R. Atadjanov

Magnetic-susceptibility-dependent ratiometric probes for enhancing quantitative MRI

用于增强定量磁共振成像的基于磁化率的比率探针

DOI：
10.1038/s41551-024-01286-4
发表时间：
2024-11-29
期刊：
Nature Biomedical Engineering
影响因子：
26.600
作者：
Cheng Zhang;Bin Nan;Juntao Xu;Tengxiang Yang;Li Xu;Chang Lu;Xiao-Bing Zhang;Jianghong Rao;Guosheng Song
通讯作者：
Guosheng Song