CAREER: Random matrices and High-dimensional statistics

职业:随机矩阵和高维统计

基本信息

  • 批准号:
    0847647
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2009
  • 资助国家:
    美国
  • 起止时间:
    2009-08-01 至 2015-07-31
  • 项目状态:
    已结题

项目摘要

This research program is focused on the development of data analysis methods and of a theoretical framework for the new paradigm of high-dimensional statistical problems. The theoretical problems are concerned with spectral properties of large dimensional random matrices. More precisely, four of the main objectives of the program are: 1) further develop new covariance estimation methods; 2) further our understanding of the spectral properties of relevant large random matrices; 3) find and contribute to areas of application where this high-dimensional statistics framework is relevant; 4) train graduate students in high-dimensional statistics and make undergraduate students at least aware of possible pitfalls of classical methods and of better alternatives when available. More specifically, statisticians are now often faced with "n by p" data matrices X, for which p, the number of variables recorded per observations, is of the same order of magnitude as n, the number of recorded observations, and p and n are both large. The sample covariance matrix computed from this data is of great importance to a number of applications, as it underlies widely used methods like principal components analysis. However, the theoretical results which underlie the method, classically developed in the "small p and large n" setting, fail to apply in the "large n and large p" setting just described. Hence, a thorough study of sample covariance matrices in this setting is needed. Eigenvalues of such large dimensional matrices are of particular interest. The investigator plans to launch a multi-pronged effort to get at various kinds of properties of these objects: for instance, he plans to develop theoretical results that will allow inferential work to be done from computation of extreme eigenvalues of sample covariance matrices, develop new methods of estimation of the whole covariance matrix, and also work on the impact of naively plugging-in the sample covariance matrix as a proxy for the population covariance in certain optimization problems which depend on this latter parameter. An effort will be made to try and apply this theoretical work to real-world problems, both to raise awareness in applied communities about the pitfalls associated with high-dimensional covariance matrices, and to shape the models that will be studied to be of most relevance to applied researchers.Technological progress allows us to store and use massive amounts of data about many aspects of our daily lives. An interesting problem is to use the data to understand how certain traits depend on each other. In the stock market, we might be interested in how the behavior of one stock affects the behavior of another stock; understanding all these interrelationships leads to having a measure of the risk taken by investing in portfolios that use the corresponding stocks. Statisticians have a number of tools to deal with all these interrelationships. We can discover ways to look at the data so that, even if all interrelationships are small or weak, so each trait "should" not help us learn too much about any other trait, we might still find combinations of the traits that carry enormous amounts of information. We also know what typical values for these combinations are, so we might be able to detect unusual features in the data set by looking at it the right way. Those statistical techniques have very wide applications in various fields of science, ranging from climatology to genetics, image recognition, finance etc... Thousands of research papers are published each year that use these techniques. However, the theory that underlies these statistical techniques was created in an era where massive datasets just did not exist. This research project is focusing on theories and their applications that are better suited to handle our current massive datasets. The applications should allow us to see structure where the classical tools fail to see any and tell us when there is no structure when the classical tools tell us there is. We also have increasing evidence that our standard tools give us often very inaccurate results about our standard measures of risk or amount of information carried in combination of traits. It seems that risks might be underestimated and amount of information might be overestimated. Part of this research program will be dedicated to measuring how inaccurate the classical results are for large datasets, how much practical predictions are affected, and how a more relevant theory can be used for correcting these inaccuracies.
本研究计划的重点是发展数据分析方法和高维统计问题新范式的理论框架。理论问题涉及到大维随机矩阵的谱性质。更准确地说,该计划的四个主要目标是:1)进一步发展新的协方差估计方法;2)进一步了解相关大随机矩阵的谱性质;3)发现并促进高维统计框架相关的应用领域;4)对研究生进行高维统计方面的培训,并使本科生至少意识到经典方法可能存在的缺陷,以及在可用的情况下有更好的替代方法。更具体地说,统计学家现在经常面临“n × p”数据矩阵X,其中p(每次观察记录的变量数量)与n(记录的观察数量)具有相同的数量级,并且p和n都很大。从这些数据中计算出的样本协方差矩阵对许多应用都非常重要,因为它是主成分分析等广泛使用的方法的基础。然而,作为该方法基础的理论结果,经典地在“小p和大n”的情况下发展,不能适用于刚刚描述的“大n和大p”的情况。因此,在这种情况下,需要对样本协方差矩阵进行深入的研究。这种大维矩阵的特征值是特别有趣的。研究人员计划开展多管齐下的努力,以获得这些物体的各种特性:例如,他计划开发理论结果,允许通过计算样本协方差矩阵的极值特征值来进行推理工作,开发估计整个协方差矩阵的新方法,并研究在依赖后一个参数的某些优化问题中天真地插入样本协方差矩阵作为总体协方差的代理的影响。将努力尝试并将这一理论工作应用于现实世界的问题,既提高应用社区对与高维协方差矩阵相关的陷阱的认识,又塑造将被研究的模型,使其与应用研究人员最相关。技术进步使我们能够存储和使用关于日常生活许多方面的大量数据。一个有趣的问题是使用数据来理解某些特征是如何相互依赖的。在股票市场中,我们可能感兴趣的是一只股票的行为如何影响另一只股票的行为;了解了所有这些相互关系,就可以对使用相应股票的投资组合所承担的风险进行衡量。统计学家有许多工具来处理所有这些相互关系。我们可以找到查看数据的方法,即使所有的相互关系都很小或很弱,所以每个特征“不应该”帮助我们了解太多其他特征,我们仍然可以找到携带大量信息的特征组合。我们还知道这些组合的典型值是什么,所以我们可以通过正确的方式来检测数据集中的异常特征。这些统计技术在各个科学领域有着非常广泛的应用,从气候学到遗传学、图像识别、金融等……每年发表的数千篇研究论文都使用了这些技术。然而,这些统计技术背后的理论是在一个大规模数据集还不存在的时代创建的。这个研究项目的重点是理论及其应用,这些理论和应用更适合处理我们当前的海量数据集。应用程序应该允许我们看到经典工具看不到的结构,并且在经典工具告诉我们有结构的时候告诉我们什么时候没有结构。我们也有越来越多的证据表明,我们的标准工具给我们的结果通常是非常不准确的,关于我们的风险标准测量或特征组合中携带的信息量。似乎风险可能被低估了,信息量可能被高估了。该研究计划的一部分将致力于测量大型数据集的经典结果有多不准确,有多少实际预测受到影响,以及如何使用更相关的理论来纠正这些不准确性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Noureddine El Karoui其他文献

Kernel density estimation with Berkson error
使用 Berkson 误差进行核密度估计
  • DOI:
    10.1002/cjs.11281
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. P. Long;Noureddine El Karoui;J. Rice
  • 通讯作者:
    J. Rice
Revenue-Maximizing Auctions: A Bidder’s Standpoint
收入最大化拍卖:投标人的立场
  • DOI:
    10.2139/ssrn.3827136
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Thomas Nedelec;Clément Calauzènes;Vianney Perchet;Noureddine El Karoui
  • 通讯作者:
    Noureddine El Karoui

Noureddine El Karoui的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Noureddine El Karoui', 18)}}的其他基金

High-dimensional M-estimation: Understanding risk, improving performance and assessing resampling
高维 M 估计:了解风险、提高性能和评估重采样
  • 批准号:
    1510172
  • 财政年份:
    2015
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Random Matrices in Multivariate Statistics: Theoretical Developments and Applications
多元统计中的随机矩阵:理论发展和应用
  • 批准号:
    0605169
  • 财政年份:
    2006
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant

相似海外基金

Random Matrices and Functional Inequalities on Spaces of Graphs
图空间上的随机矩阵和函数不等式
  • 批准号:
    2331037
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Collaborative Research: Random Matrices and Algorithms in High Dimension
合作研究:高维随机矩阵和算法
  • 批准号:
    2306438
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Some topics in Analysis and Probability in Metric Measure Spaces, Random Matrices, and Diffusions
度量测度空间、随机矩阵和扩散中的分析和概率中的一些主题
  • 批准号:
    2247117
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Random Matrices, Random Graphs, and Deep Neural Networks
随机矩阵、随机图和深度神经网络
  • 批准号:
    2331096
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Conference: Random matrices from quantum chaos to the Riemann zeta function.
会议:从量子混沌到黎曼 zeta 函数的随机矩阵。
  • 批准号:
    2306332
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Collaborative Research: Random Matrices and Algorithms in High Dimension
合作研究:高维随机矩阵和算法
  • 批准号:
    2306439
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Random matrices, operators, and analytic functions
随机矩阵、运算符和解析函数
  • 批准号:
    2246435
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Random structures in high dimensions: Matrices, polynomials and point processes
高维随机结构:矩阵、多项式和点过程
  • 批准号:
    2246624
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
Asymptotic Geometric Analysis, Random Matrices, and Applications
渐近几何分析、随机矩阵及其应用
  • 批准号:
    RGPIN-2022-03483
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
Free Probability and Random Matrices
自由概率和随机矩阵
  • 批准号:
    RGPIN-2018-04458
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了