Inference in High-Dimensional Statistical Models: Algorithmic Tractability and Computational Barriers

高维统计模型中的推理:算法易处理性和计算障碍

基本信息

  • 批准号:
    2015517
  • 负责人:
  • 金额:
    $ 20万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2023-08-31
  • 项目状态:
    已结题

项目摘要

Extracting knowledge from data using statistical and machine learning methods often involves computations, which don't scale well with dataset sizes. This is dictated by the necessity of analyzing large scale statistical models, where the scale of the data ever increases due to our unprecedented ability to accumulative massive amounts of it. Often this leads to models where the number of parameters far exceeds the amount of collected data, rendering many classical inference models ill-posed and classical computational methods prohibitively time consuming. Thus the value brought about by the abundance of data comes at the expense of the necessity to develop completely novel computational tools that are capable of dealing with the curse of dimensionality. While there is an abundance of literature devoted to designing efficient computational methods of inference in high-dimensional statistical models, it was discovered that many algorithms hit a certain computational barrier, beyond which seemingly only brute-force and thus computationally prohibitive algorithms can succeed. Not much is known regarding the fundamental computational limitations arising above this barrier, which is popularly dubbed the nformation Theoretic vs Computation gap. What is the origin of this barrier? Does it indeed correspond to the onset of algorithmically intractable problems, or is it just a matter of being more clever about designing faster algorithms? The project also provides research training opportunities for graduate students. In the present project the PI develops a completely novel approach for understanding fundamental computational barriers arising in high dimensional statistical models. The approach is based on powerful and illuminating insights derived from the field of statistical physics, specifically the theory of spin glasses. In particular, the PI intends to establish that the onset of the algorithmic barriers is caused by phase transition in the landscape of the solution space, marking a drastic change in the solution space geometry of underlying inference problems. This change in geometry of the solution space landscape taking the form of the so-called Overlap Gap Property (OGP), can further be used to rule out broad classes of algorithms as potential contenders to bridge the information theoretic and algorithmic gap. These classes of algorithms include algorithms based on local improvements, such as Gradient Descent and Stochastic Gradient Descend algorithms, algorithms based on Markov Chain Monte Carlo Method, algorithms broadly defined as Approximate Message Passing iterations, and algorithms based on constructing low-degree polynomials. The PI in particular intends to investigate the validity of a bold conjecture stating that for most, if not all of the known models exhibiting apparent algorithmic barriers, the onset of this barrier coincides with the onset of the OGP. The PI intends to investigate this conjecture in the context of several widely studied modern models of high dimensional statistics and machine learning fields, including the Stochastic Block Model, the Spiked Tensor Model, and Wide Neural Networks model. All of these models are known to exhibit an apparent algorithmic hardness in some parameter regimes and thus these models offer a valuable framework for investigating the validity of the aforementioned conjecture, as well as algorithmic intractability implications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
使用统计和机器学习方法从数据中提取知识通常涉及计算,这些计算不能很好地扩展数据集的大小。这是由于分析大规模统计模型的必要性,由于我们前所未有的积累大量数据的能力,数据的规模不断增加,这通常会导致模型的参数数量远远超过收集的数据量,使许多经典的推理模型不适定,经典的计算方法非常耗时。因此,丰富的数据所带来的价值是以开发能够处理维数灾难的全新计算工具的必要性为代价的。虽然有大量的文献致力于在高维统计模型中设计有效的计算推理方法,但人们发现,许多算法都遇到了一定的计算障碍,超过了这个障碍,似乎只有蛮力算法才能成功。关于这一障碍之上产生的基本计算限制,人们所知不多,这通常被称为信息理论与计算差距。这道屏障的起源是什么?它是否真的对应于算法上难以解决的问题的出现,或者它只是一个设计更快算法的更聪明的问题?该项目还为研究生提供研究培训机会。在本项目中,PI开发了一种全新的方法来理解高维统计模型中产生的基本计算障碍。该方法是基于强大的和启发性的见解来自统计物理学领域,特别是自旋玻璃理论。特别是,PI打算建立算法障碍的发作是由解空间景观中的相变引起的,标志着底层推理问题的解空间几何结构的急剧变化。解决方案空间景观的几何形状的这种变化采取所谓的重叠间隙属性(OGP)的形式,可以进一步用于排除广泛的算法类别作为弥合信息理论和算法差距的潜在竞争者。这些类别的算法包括基于局部改进的算法,例如梯度下降和随机梯度下降算法,基于马尔可夫链蒙特卡罗方法的算法,广义上定义为近似消息传递迭代的算法,以及基于构造低次多项式的算法。PI特别打算研究一个大胆猜想的有效性,该猜想指出,对于大多数(如果不是所有)表现出明显算法障碍的已知模型,该障碍的发生与OGP的发生一致。PI打算在几个广泛研究的高维统计和机器学习领域的现代模型的背景下研究这个猜想,包括随机块模型,尖峰张量模型和宽神经网络模型。所有这些模型是已知的,表现出明显的算法硬度在某些参数制度,因此这些模型提供了一个有价值的框架,调查上述猜想的有效性,以及算法的棘手性implications.This奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Computing the partition function of the Sherrington–Kirkpatrick model is hard on average
平均而言,计算 Sherrington–Kirkpatrick 模型的配分函数很困难
  • DOI:
    10.1214/20-aap1625
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Gamarnik, David;Kızıldağ, Eren C.
  • 通讯作者:
    Kızıldağ, Eren C.
Performance and limitations of the QAOA at constant levels on large sparse hypergraphs and spin glass models
The overlap gap property and approximate message passing algorithms for $p$-spin models
$p$-spin 模型的重叠间隙属性和近似消息传递算法
  • DOI:
    10.1214/20-aop1448
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Gamarnik, David;Jagannath, Aukosh
  • 通讯作者:
    Jagannath, Aukosh
Sharp Thresholds Imply Circuit Lower Bounds: from random 2-SAT to Planted Clique
尖锐的阈值意味着电路下限:从随机 2-SAT 到种植派
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Gamarnik;Elchanan Mossel;Ilias Zadik
  • 通讯作者:
    Ilias Zadik
Geometric Barriers for Stable and Online Algorithms for Discrepancy Minimization
用于差异最小化的稳定和在线算法的几何障碍
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Gamarnik其他文献

Hamiltonian completions of sparse random graphs
  • DOI:
    10.1016/j.dam.2005.05.001
  • 发表时间:
    2005-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    David Gamarnik;Maxim Sviridenko
  • 通讯作者:
    Maxim Sviridenko
Integrating High-Dimensional Functions Deterministically
确定性地积分高维函数
  • DOI:
    10.48550/arxiv.2402.08232
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Gamarnik;Devin Smedira
  • 通讯作者:
    Devin Smedira
The stability of the deterministic Skorokhod problem is undecidable
  • DOI:
    10.1007/s11134-014-9424-8
  • 发表时间:
    2014-10-19
  • 期刊:
  • 影响因子:
    0.700
  • 作者:
    David Gamarnik;Dmitriy Katz
  • 通讯作者:
    Dmitriy Katz
Computing the Volume of a Restricted Independent Set Polytope Deterministically
确定性计算受限独立集多面体的体积
  • DOI:
    10.48550/arxiv.2312.03906
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Gamarnik;Devin Smedira
  • 通讯作者:
    Devin Smedira
On the Value of a Random Minimum Weight Steiner Tree
  • DOI:
    10.1007/s00493-004-0013-z
  • 发表时间:
    2004-04-01
  • 期刊:
  • 影响因子:
    1.000
  • 作者:
    Béla Bollobás*;David Gamarnik;Oliver Riordan;Benny Sudakov†
  • 通讯作者:
    Benny Sudakov†

David Gamarnik的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Gamarnik', 18)}}的其他基金

AF: Small: Low-Degree Methods for Optimization in Random Structures. Power and Limitations
AF:小:随机结构优化的低度方法。
  • 批准号:
    2233897
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Local Algorithms for Random Networks: Power, Limitations and Applications
随机网络的局部算法:能力、限制和应用
  • 批准号:
    1335155
  • 财政年份:
    2013
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Statistical Physics Methods and Algorithmic Applications in Graphical Games and Combinatorial Optimization
统计物理方法和算法在图形游戏和组合优化中的应用
  • 批准号:
    1031332
  • 财政年份:
    2010
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Stochastic Networks in the Heavy Traffic Regime: Algorithms, Approximations and Applications
高流量情况下的随机网络:算法、近似值和应用
  • 批准号:
    0726733
  • 财政年份:
    2007
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

CAREER: Towards Tight Guarantees of Markov Chain Sampling Algorithms in High Dimensional Statistical Inference
职业:高维统计推断中马尔可夫链采样算法的严格保证
  • 批准号:
    2237322
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
  • 批准号:
    2347760
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Statistical learning and causal inference in high-dimensional genomics data across multiple information layers
跨多个信息层的高维基因组数据的统计学习和因果推理
  • 批准号:
    DGECR-2022-00445
  • 财政年份:
    2022
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Launch Supplement
Statistical learning and causal inference in high-dimensional genomics data across multiple information layers
跨多个信息层的高维基因组数据的统计学习和因果推理
  • 批准号:
    RGPIN-2022-03708
  • 财政年份:
    2022
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Grants Program - Individual
Topics in Statistical Modelling and Inference with High-Dimensional, Complex Data
高维、复杂数据的统计建模和推理主题
  • 批准号:
    RGPIN-2017-05720
  • 财政年份:
    2022
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Grants Program - Individual
Collaborative Research: Statistical Inference for High-dimensional Spatial-Temporal Process Models
合作研究:高维时空过程模型的统计推断
  • 批准号:
    2113779
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Collaborative Research: Statistical Inference for High-dimensional Spatial-Temporal Process Models
合作研究:高维时空过程模型的统计推断
  • 批准号:
    2113778
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
High-dimensional statistical inference in parametric and nonparametric models
参数和非参数模型中的高维统计推断
  • 批准号:
    RGPIN-2016-06262
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Grants Program - Individual
High-dimensional statistical inference: model diagnostics, covariance matrix estimation and overdispersion data.
高维统计推断:模型诊断、协方差矩阵估计和过度离散数据。
  • 批准号:
    RGPIN-2016-05174
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Grants Program - Individual
Topics in Statistical Modelling and Inference with High-Dimensional, Complex Data
高维、复杂数据的统计建模和推理主题
  • 批准号:
    RGPIN-2017-05720
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了