Discovering Sparse Covariance Structures in High Dimensions

发现高维稀疏协方差结构

基本信息

项目摘要

This project focuses on discovering and exploiting sparse structures in the data to improve estimation of covariance matrices in high dimensions. The covariance matrix plays a key role in many data analysis methods, including principal component analysis, discriminant analysis, inference about the means in multivariate analysis, and inference about independence and conditional independence relationships in graphical models. Advances in random matrix theory have shown that the traditional estimator, the sample covariance, performs poorly in high dimensions. The existing research on alternative estimators, including previous work of the PI, focuses mostly on the situation when there is a notion of distance or ordering for the variable indexes (time series, longitudinal data, spatial data, spectroscopy, etc). However, there are many applications where such ordering is not available: for example, genetics, financial, social and economic data. This project develops several methods for constructing regularized sparse estimators that are invariant to variable permutations, both for the covariance matrix and its inverse. The main building blocks of the methods are thresholding, smooth penalties that encourage sparsity, permutation-invariant loss functions, adaptive weights, and manifold projections to discover potential structured re-orderings of the variables. Analytical results establishing consistency and convergence rates of the proposed estimators in high dimensions are fully developed. These theoretical results in high dimensions require tools that are different from standard asymptotic analysis, and there are few available in the existing literature. Efficient optimization algorithms needed to compute these estimators are developed, with the emphasis on the computational cost growing as slowly as possible with dimension. Some of the estimators proposed carry a very low computation cost by design, while others require computational ingenuity to be feasible in really high dimensions. The proposed methodology is tested extensively, both in simulations and on a number of applications through the PI's interdisciplinary collaborations.Massive amounts of data collected in the modern world are creating new challenges for statisticians. There is an urgent need for new theoretical and practical methods that deal with high-dimensional data, and a vast number of applications where high-dimensional covariance matrices need to be estimated as part of data analysis: finance, genetics, spectroscopy, remote sensing, climate studies, brain imaging, speech recognition, and many others. The PI has ongoing collaborations with chemists on Raman spectroscopy of bone, with oceanologists on using spectral data for remote ocean sensing, with climate scientists on temperature modeling and with a biostatistician on a new type of gene expression technology that works at protein level. The PI also works actively in the area of statistical signal processing by wireless sensor networks, where spatial covariance estimation is important, and which has many security applications. The new methodology for estimating high-dimensional covariances developed in this project is analyzed theoretically and tested and validated in these applications, and in turn, the directions in which the project develops at later stages are influenced by the issues and needs of the applications. The project also contributes to educating graduate students in an important area of modern statistics.
这个项目的重点是发现和利用数据中的稀疏结构来改进高维协方差矩阵的估计。协方差矩阵在许多数据分析方法中发挥着关键作用,包括主成分分析、判别分析、多变量分析中的均值推断以及图形模型中独立性和条件独立关系的推断。随机矩阵理论的发展表明,传统的样本协方差估计在高维情况下表现不佳。现有的替代估计的研究,包括PI之前的工作,主要集中在变量指标(时间序列、纵向数据、空间数据、光谱等)存在距离或排序概念的情况下。然而,在许多应用程序中,这种排序是不可用的:例如,遗传学、金融、社会和经济数据。本项目开发了几种方法来构造正则化稀疏估计量,这些估计量对变量置换是不变的,包括协方差矩阵和它的逆。这些方法的主要组成部分是阈值、鼓励稀疏性的平滑惩罚、排列不变损失函数、自适应权重和流形投影,以发现变量的潜在结构化重新排序。分析结果充分证明了所提估计量在高维上的一致性和收敛率。这些高维的理论结果需要不同于标准渐近分析的工具,而现有文献中可用的工具很少。开发了计算这些估计量所需的高效优化算法,重点是计算成本随维度的增长尽可能慢。提出的一些估计器在设计上具有非常低的计算成本,而其他估计器则需要计算的独创性才能在真正高维的情况下可行。通过PI的跨学科合作,所提出的方法在模拟和许多应用中得到了广泛的测试。现代社会收集的大量数据给统计学家带来了新的挑战。目前迫切需要新的理论和实践方法来处理高维数据,以及大量需要将高维协方差矩阵作为数据分析一部分进行估计的应用:金融、遗传学、光谱学、遥感、气候研究、脑成像、语音识别等。PI正在与化学家合作研究骨骼的拉曼光谱,与海洋学家合作使用光谱数据进行海洋遥感,与气候科学家合作研究温度模型,与生物统计学家合作研究一种新型的蛋白质水平的基因表达技术。PI还在无线传感器网络的统计信号处理领域积极工作,其中空间协方差估计很重要,并且有许多安全应用。本项目中开发的用于估计高维协方差的新方法在这些应用中进行了理论分析和测试,并在这些应用中进行了验证,反过来,项目在后期发展的方向受到应用程序的问题和需求的影响。该项目还有助于在现代统计学的一个重要领域对研究生进行教育。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Elizaveta Levina其他文献

Elizaveta Levina的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Elizaveta Levina', 18)}}的其他基金

FRG: Collaborative Research: Flexible Network Inference
FRG:协作研究:灵活的网络推理
  • 批准号:
    2052918
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Multivariate Analysis for Samples of Networks
网络样本的多变量分析
  • 批准号:
    1916222
  • 财政年份:
    2019
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
RTG: Understanding dynamic big data with complex structure
RTG:理解结构复杂的动态大数据
  • 批准号:
    1646108
  • 财政年份:
    2017
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Conference proposal: From Industrial Statistics to Data Science
会议提案:从工业统计到数据科学
  • 批准号:
    1542123
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Statistical Tools for Analyzing Multiple Networks
用于分析多个网络的统计工具
  • 批准号:
    1521551
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
FRG: Collaborative Research: Unified statistical theory for the analysis and discovery of complex networks
FRG:协作研究:用于分析和发现复杂网络的统一统计理论
  • 批准号:
    1159005
  • 财政年份:
    2012
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Statistical Methods for Network Data
网络数据的统计方法
  • 批准号:
    1106772
  • 财政年份:
    2011
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Exploiting Special Structures in High-Dimensional Data Classification
在高维数据分类中利用特殊结构
  • 批准号:
    0505424
  • 财政年份:
    2005
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant

相似国自然基金

基于Sparse-Land模型的SAR图像噪声抑制与分割
  • 批准号:
    60971128
  • 批准年份:
    2009
  • 资助金额:
    30.0 万元
  • 项目类别:
    面上项目

相似海外基金

The Global Structure of Sparse Networks
稀疏网络的全局结构
  • 批准号:
    DP240100198
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Discovery Projects
CAREER: Compiler and Runtime Support for Sampled Sparse Computations on Heterogeneous Systems
职业:异构系统上采样稀疏计算的编译器和运行时支持
  • 批准号:
    2338144
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
ERI: AI-Enhanced Dynamic Interference Suppression in Cognitive Sensing with Reconfigurable Sparse Arrays
ERI:利用可重构稀疏阵列在认知传感中进行人工智能增强型动态干扰抑制
  • 批准号:
    2347220
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Creating digital twins of flows from noisy and sparse flow-MRI data
从嘈杂和稀疏的流 MRI 数据创建流的数字孪生
  • 批准号:
    EP/X028232/1
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Fellowship
Inverting turbulence: flow patterns and parameters from sparse data
反演湍流:来自稀疏数据的流动模式和参数
  • 批准号:
    EP/X017273/1
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Research Grant
CIF:Small:Learning Sparse Vector and Matrix Graphs from Time-Dependent Data
CIF:小:从瞬态数据中学习稀疏向量和矩阵图
  • 批准号:
    2308473
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Sparse Sensor Array Design and Processing
稀疏传感器阵列设计与处理
  • 批准号:
    2236023
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: Physics-inspired Machine Learning with Sparse and Asynchronous p-bits
职业:利用稀疏和异步 p 位进行物理启发的机器学习
  • 批准号:
    2237357
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Realization of sparse control with model predictive control and guarantee of its performance
模型预测控制稀疏控制的实现及其性能保证
  • 批准号:
    23K03916
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
ATD: Sparse and Localized Graph Convolutional Networks for Anomaly Detection and Active Learning
ATD:用于异常检测和主动学习的稀疏和局部图卷积网络
  • 批准号:
    2220574
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了