High-dimensional M-estimation: Understanding risk, improving performance and assessing resampling

高维 M 估计:了解风险、提高性能和评估重采样

基本信息

  • 批准号:
    1510172
  • 负责人:
  • 金额:
    $ 39.42万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-08-01 至 2019-10-31
  • 项目状态:
    已结题

项目摘要

The nature of datasets that scientists in academia or industry are currently working with is changing at a very rapid pace. The data is more complex, larger and higher-dimensional than it has ever been before. A lot of the methods used in practice are based on the idea that they are somehow optimal, in terms of measurement accuracy or prediction of future outcomes, at least for certain models of data generating mechanism. A most widely applied and time-honored principle of data analysis is the use of so-called "maximum likelihood methods". It has recently been discovered by the PI and collaborators that in a setting often encountered with modern and large datasets, these maximum likelihood methods are suboptimal and can be improved upon. This is true even for an extremely basic and widely used technique (e.g.,linear regression). One of the aims of the project is to understand if the same phenomena occur for other methods that are widely used in machine/statistical learning practice and in turn develop better tools for data scientists and data analysts. Currently, accuracy assessment for these estimators are often performed through data driven procedures (such as the bootstrap). Another aim of the project is to understand if the corresponding accuracy assessment are misleading for datasets with many predictors. If that is the case, the PI is planning to work on methods to correct the existing procedures so they yield trustworthy accuracy assessments. High-dimensional statistics offers a profound challenge to classical statistics, both on the applied and the theoretical end. A broad class of methods used in practice is based on solving nontrivial optimization problems to estimate parameters of interest. This yields a so-called M-estimator. When the dimension of this estimator is small compared to the number of observations the practitioner has, standard empirical process techniques can be applied to understand the statistical properties of those estimators. In the setting the PI considers, these techniques fail and new techniques need to be developed. The PI plans on using a mix of tools inspired from random matrix theory, convex analysis and concentration of measure results to study those estimators. The development of new optimal methods is expected - based on using tools from convex analysis. Another exciting research line is that the techniques developed by the PI should allow us to study resampling methods in high-dimension (such as the bootstrap). Those are widely used to assess statistical significance from the observed dataset, without having to appeal to theoretical arguments. While the low-dimensional theory is well-established and relatively easy, and suggests that these numerical methods should work well, the high-dimensional case has yet to be understood. The PI plans on studying these problems thoroughly and propose practically relevant solutions if these widely used-in-practice methods are shown to provide statistically misleading accuracy assessments.
学术界或工业界的科学家目前正在使用的数据集的性质正在以非常快的速度发生变化。数据比以往任何时候都更复杂,更大,更高维。实践中使用的许多方法都是基于这样一种想法,即它们在测量准确性或预测未来结果方面是最优的,至少对于某些数据生成机制模型来说是如此。数据分析的一个最广泛应用和历史悠久的原则是使用所谓的“最大似然法”。PI和合作者最近发现,在现代和大型数据集经常遇到的环境中,这些最大似然方法是次优的,可以改进。即使对于极其基本和广泛使用的技术(例如,线性回归)。该项目的目标之一是了解在机器/统计学习实践中广泛使用的其他方法是否也会出现同样的现象,从而为数据科学家和数据分析师开发更好的工具。目前,这些估计的准确性评估往往是通过数据驱动的程序(如自举)。该项目的另一个目的是了解相应的准确性评估对于具有许多预测因子的数据集是否具有误导性。如果是这种情况,PI计划研究纠正现有程序的方法,以便产生值得信赖的准确性评估。高维统计在应用和理论上都对经典统计提出了深刻的挑战。在实践中使用的一大类方法是基于解决非平凡优化问题来估计感兴趣的参数。这产生了所谓的M估计量。当这个估计量的维数与从业者拥有的观测值的数量相比很小时,可以应用标准的经验过程技术来理解这些估计量的统计特性。在PI考虑的设置中,这些技术失败,需要开发新技术。PI计划使用随机矩阵理论、凸分析和测量结果集中的工具来研究这些估计量。新的优化方法的发展是预期的-基于使用工具从凸分析。另一个令人兴奋的研究方向是,PI开发的技术应该允许我们研究高维中的恢复方法(如bootstrap)。这些被广泛用于评估观察数据集的统计显著性,而不必诉诸理论论据。虽然低维理论已经很好地建立起来,并且相对容易,并且表明这些数值方法应该很好地工作,但高维情况尚未被理解。PI计划彻底研究这些问题,并提出实际相关的解决方案,如果这些广泛使用的实践方法被证明提供统计误导的准确性评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Noureddine El Karoui其他文献

Kernel density estimation with Berkson error
使用 Berkson 误差进行核密度估计
  • DOI:
    10.1002/cjs.11281
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. P. Long;Noureddine El Karoui;J. Rice
  • 通讯作者:
    J. Rice
Revenue-Maximizing Auctions: A Bidder’s Standpoint
收入最大化拍卖:投标人的立场
  • DOI:
    10.2139/ssrn.3827136
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Thomas Nedelec;Clément Calauzènes;Vianney Perchet;Noureddine El Karoui
  • 通讯作者:
    Noureddine El Karoui

Noureddine El Karoui的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Noureddine El Karoui', 18)}}的其他基金

CAREER: Random matrices and High-dimensional statistics
职业:随机矩阵和高维统计
  • 批准号:
    0847647
  • 财政年份:
    2009
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Continuing Grant
Random Matrices in Multivariate Statistics: Theoretical Developments and Applications
多元统计中的随机矩阵:理论发展和应用
  • 批准号:
    0605169
  • 财政年份:
    2006
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Standard Grant

相似国自然基金

肌肉挫伤后组织中时间相关基因表达与损伤经历时间研究
  • 批准号:
    81001347
  • 批准年份:
    2010
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
基于计算和存储感知的运动估计算法与结构研究
  • 批准号:
    60803013
  • 批准年份:
    2008
  • 资助金额:
    18.0 万元
  • 项目类别:
    青年科学基金项目
多用户MIMO-OFDM系统中的同步和信道估计的研究
  • 批准号:
    60302025
  • 批准年份:
    2003
  • 资助金额:
    30.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

Development of estimation methods for subsurface structures based on the deepened understanding of strain seismograms recorded with DAS
基于对DAS记录的应变地震图的加深理解,开发地下结构的估计方法
  • 批准号:
    23K03521
  • 财政年份:
    2023
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Computational Understanding and Estimation of Operational Feeling of Mechanical Systems
机械系统操作感的计算理解和估计
  • 批准号:
    21H01296
  • 财政年份:
    2021
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Understanding the Fast and Slow Spatiotemporal Dynamics of Human Seizures
了解人类癫痫发作的快慢时空动态
  • 批准号:
    10584583
  • 财政年份:
    2019
  • 资助金额:
    $ 39.42万
  • 项目类别:
Development of precision estimation method of rainfall erosivity for better understanding of soil erosion in data-sparse regions
开发降雨侵蚀力精确估算方法,以更好地了解数据稀疏地区的土壤侵蚀
  • 批准号:
    19K13434
  • 财政年份:
    2019
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Development of an estimation method of error causes in problems for representing learners' understanding of program behavior
开发问题中错误原因的估计方法,以代表学习者对程序行为的理解
  • 批准号:
    19K21782
  • 财政年份:
    2019
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Development of potential fishing ground estimation models by machine learning for understanding interspecies relationships
通过机器学习开发潜在渔场估计模型,以了解物种间关系
  • 批准号:
    18K05803
  • 财政年份:
    2018
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Understanding and developing deep learning as estimation procedures of the high-dimensional parameter
理解和发展深度学习作为高维参数的估计过程
  • 批准号:
    18K11208
  • 财政年份:
    2018
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Statistical Estimation Development for understanding poverty situations in Small Area Estimation
用于了解小区域估计中贫困状况的统计估计开发
  • 批准号:
    18K12758
  • 财政年份:
    2018
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Understanding Regression Heterogeneity Through Joint Estimation of Conditional Quantiles
通过条件分位数的联合估计了解回归异质性
  • 批准号:
    1613173
  • 财政年份:
    2016
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Standard Grant
Development of algorithms to increase estimation accuracy of surface displacement using PSInSAR analysis, and understanding complex surface displacement phenomena
开发算法以使用 PSInSAR 分析提高表面位移的估计精度,并理解复杂的表面位移现象
  • 批准号:
    15H06843
  • 财政年份:
    2015
  • 资助金额:
    $ 39.42万
  • 项目类别:
    Grant-in-Aid for Research Activity Start-up
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了