权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Discovering Sparse Covariance Structures in High Dimensions

发现高维稀疏协方差结构

基本信息

批准号：
0805798
负责人：
Elizaveta Levina
金额：
$ 25万
依托单位：
Regents of the University of Michigan - Ann Arbor
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2008
资助国家：
美国
起止时间：
2008-06-01 至 2012-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0805798&HistoricalAwards=false
关键词：
Discovering Sparse Covariance Structures Dimensions

项目摘要

This project focuses on discovering and exploiting sparse structures in the data to improve estimation of covariance matrices in high dimensions. The covariance matrix plays a key role in many data analysis methods, including principal component analysis, discriminant analysis, inference about the means in multivariate analysis, and inference about independence and conditional independence relationships in graphical models. Advances in random matrix theory have shown that the traditional estimator, the sample covariance, performs poorly in high dimensions. The existing research on alternative estimators, including previous work of the PI, focuses mostly on the situation when there is a notion of distance or ordering for the variable indexes (time series, longitudinal data, spatial data, spectroscopy, etc). However, there are many applications where such ordering is not available: for example, genetics, financial, social and economic data. This project develops several methods for constructing regularized sparse estimators that are invariant to variable permutations, both for the covariance matrix and its inverse. The main building blocks of the methods are thresholding, smooth penalties that encourage sparsity, permutation-invariant loss functions, adaptive weights, and manifold projections to discover potential structured re-orderings of the variables. Analytical results establishing consistency and convergence rates of the proposed estimators in high dimensions are fully developed. These theoretical results in high dimensions require tools that are different from standard asymptotic analysis, and there are few available in the existing literature. Efficient optimization algorithms needed to compute these estimators are developed, with the emphasis on the computational cost growing as slowly as possible with dimension. Some of the estimators proposed carry a very low computation cost by design, while others require computational ingenuity to be feasible in really high dimensions. The proposed methodology is tested extensively, both in simulations and on a number of applications through the PI's interdisciplinary collaborations.Massive amounts of data collected in the modern world are creating new challenges for statisticians. There is an urgent need for new theoretical and practical methods that deal with high-dimensional data, and a vast number of applications where high-dimensional covariance matrices need to be estimated as part of data analysis: finance, genetics, spectroscopy, remote sensing, climate studies, brain imaging, speech recognition, and many others. The PI has ongoing collaborations with chemists on Raman spectroscopy of bone, with oceanologists on using spectral data for remote ocean sensing, with climate scientists on temperature modeling and with a biostatistician on a new type of gene expression technology that works at protein level. The PI also works actively in the area of statistical signal processing by wireless sensor networks, where spatial covariance estimation is important, and which has many security applications. The new methodology for estimating high-dimensional covariances developed in this project is analyzed theoretically and tested and validated in these applications, and in turn, the directions in which the project develops at later stages are influenced by the issues and needs of the applications. The project also contributes to educating graduate students in an important area of modern statistics.

这个项目的重点是发现和利用数据中的稀疏结构来改进高维协方差矩阵的估计。协方差矩阵在许多数据分析方法中发挥着关键作用，包括主成分分析、判别分析、多变量分析中的均值推断以及图形模型中独立性和条件独立关系的推断。随机矩阵理论的发展表明，传统的样本协方差估计在高维情况下表现不佳。现有的替代估计的研究，包括PI之前的工作，主要集中在变量指标（时间序列、纵向数据、空间数据、光谱等）存在距离或排序概念的情况下。然而，在许多应用程序中，这种排序是不可用的：例如，遗传学、金融、社会和经济数据。本项目开发了几种方法来构造正则化稀疏估计量，这些估计量对变量置换是不变的，包括协方差矩阵和它的逆。这些方法的主要组成部分是阈值、鼓励稀疏性的平滑惩罚、排列不变损失函数、自适应权重和流形投影，以发现变量的潜在结构化重新排序。分析结果充分证明了所提估计量在高维上的一致性和收敛率。这些高维的理论结果需要不同于标准渐近分析的工具，而现有文献中可用的工具很少。开发了计算这些估计量所需的高效优化算法，重点是计算成本随维度的增长尽可能慢。提出的一些估计器在设计上具有非常低的计算成本，而其他估计器则需要计算的独创性才能在真正高维的情况下可行。通过PI的跨学科合作，所提出的方法在模拟和许多应用中得到了广泛的测试。现代社会收集的大量数据给统计学家带来了新的挑战。目前迫切需要新的理论和实践方法来处理高维数据，以及大量需要将高维协方差矩阵作为数据分析一部分进行估计的应用：金融、遗传学、光谱学、遥感、气候研究、脑成像、语音识别等。PI正在与化学家合作研究骨骼的拉曼光谱，与海洋学家合作使用光谱数据进行海洋遥感，与气候科学家合作研究温度模型，与生物统计学家合作研究一种新型的蛋白质水平的基因表达技术。PI还在无线传感器网络的统计信号处理领域积极工作，其中空间协方差估计很重要，并且有许多安全应用。本项目中开发的用于估计高维协方差的新方法在这些应用中进行了理论分析和测试，并在这些应用中进行了验证，反过来，项目在后期发展的方向受到应用程序的问题和需求的影响。该项目还有助于在现代统计学的一个重要领域对研究生进行教育。