CAREER: New Statistical Paradigms Reconciling Empirical Surprises in Modern Machine Learning

职业:新的统计范式调和现代机器学习中的经验惊喜

基本信息

  • 批准号:
    2042473
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2026-06-30
  • 项目状态:
    未结题

项目摘要

Exciting empirical breakthroughs have emerged in data science and engineering through combination of large-scale datasets, increasingly complex statistical models, and advanced computational power. The success also promises new directions in statistics and econometrics, among other scientific disciplines. Nevertheless, the empirical phenomena exhibited by modern Machine Learning (ML) challenge the core mathematical concepts in statistics and computation: (a) Why can complex over-parametrized models enjoy excellent statistical performances even with interpolating the training examples? (b) Why can seemingly simple stochastic optimization methods optimize such complex models effectively? (c) What kinds of structures or representations of data are responsible for modern ML models’ efficacy over classical statistical models when the dimension becomes moderately large? This project aims to develop new statistical and computational paradigms that bridge the gap between theory and practice for learning from data. The project will also significantly impact undergraduate and graduate students’ training in data science research through synergetic educational and research activities to be hosted under a new initiative that integrates and enhances resources across the fields of statistics and economics.The project will investigate the role of regularization, statistical performance, and optimization algorithms in modern ML models, including kernel machines, boosting, random forests, and neural networks. In particular, the PI will focus on the following three modules. (a) Learning functions in the interpolation/overfitting regime: The PI will study the statistical performance of minimum-norm interpolated solutions, which fall beyond the realm of the classical empirical risk minimization analysis. The PI also plans to develop a rigorous mathematical framework to quantify the adaptive representation aspects of specific ML models. (b) Learning distributions with generative models and simulation-based inference: The PI will investigate the statistical foundations of generative models for learning implicit probability distributions and study new simulation-based inference procedures. (c) Optimization algorithms motivated by stochastic approximation and online learning: The PI will study the interplay between optimization and statistical performance of gradient-based stochastic approximation methods for learning complex ML models with non-convex landscapes. The research intends to challenge conventional wisdom in statistics and computation, modernize nonparametric statistics and learning theory education, and further shed light on devising the next generation nonparametric models with algorithms and computation in mind.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
通过大规模数据集、日益复杂的统计模型和先进的计算能力的结合,数据科学和工程领域出现了令人兴奋的经验突破。这一成功也预示着统计学和计量经济学以及其他科学学科的新方向。然而,现代机器学习(ML)所表现出的经验现象挑战了统计和计算中的核心数学概念:(a)为什么复杂的过参数化模型即使在插值训练样本的情况下也能获得出色的统计性能?(b)为什么看似简单的随机优化方法可以有效地优化如此复杂的模型?(c)当维度变得适度大时,什么样的数据结构或表示方式负责现代ML模型对经典统计模型的有效性?该项目旨在开发新的统计和计算范例,弥合理论与实践之间的差距,从数据中学习。该项目还将通过整合和增强统计学和经济学领域资源的新举措举办协同教育和研究活动,对本科生和研究生的数据科学研究培训产生重大影响。该项目将研究正则化,统计性能和优化算法在现代ML模型中的作用,包括内核机器,boosting,随机森林,和神经网络。具体而言,PI将侧重于以下三个模块。(a)插值/过拟合机制中的学习函数:PI将研究最小范数插值解的统计性能,这超出了经典经验风险最小化分析的范围。PI还计划开发一个严格的数学框架来量化特定ML模型的自适应表示方面。(b)使用生成模型和基于模拟的推理学习分布:PI将研究用于学习隐式概率分布的生成模型的统计基础,并研究新的基于模拟的推理程序。(c)随机近似和在线学习激励的优化算法:PI将研究基于梯度的随机近似方法的优化和统计性能之间的相互作用,用于学习具有非凸景观的复杂ML模型。该研究旨在挑战统计和计算的传统智慧,实现非参数统计和学习理论教育的现代化,并进一步阐明如何设计下一代非参数模型,同时考虑到算法和计算。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
Online Learning to Transport via the Minimal Selection Principle
  • DOI:
  • 发表时间:
    2022-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wenxuan Guo;Y. Hur;Tengyuan Liang;Christopher Ryan
  • 通讯作者:
    Wenxuan Guo;Y. Hur;Tengyuan Liang;Christopher Ryan
How Well Generative Adversarial Networks Learn Distributions
  • DOI:
    10.2139/ssrn.3714011
  • 发表时间:
    2018-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tengyuan Liang
  • 通讯作者:
    Tengyuan Liang
DEEP NEURAL NETWORKS FOR ESTIMATION AND INFERENCE
  • DOI:
    10.3982/ecta16901
  • 发表时间:
    2021-01-01
  • 期刊:
  • 影响因子:
    6.1
  • 作者:
    Farrell, Max H.;Liang, Tengyuan;Misra, Sanjog
  • 通讯作者:
    Misra, Sanjog
Universal Prediction Band via Semi-Definite Programming
通过半定规划的通用预测带
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tengyuan Liang其他文献

On the Minimax Optimality of Estimating the Wasserstein Metric
关于 Wasserstein 度量估计的极小极大最优性
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tengyuan Liang
  • 通讯作者:
    Tengyuan Liang
On the Risk of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels
关于最小范数插值和核的限制下等距的风险
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tengyuan Liang;A. Rakhlin;Xiyu Zhai
  • 通讯作者:
    Xiyu Zhai
On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results
  • DOI:
  • 发表时间:
    2018-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tengyuan Liang
  • 通讯作者:
    Tengyuan Liang
Inference and Learning: Computational Difficulty and Efficiency
推理与学习:计算难度与效率
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tengyuan Liang
  • 通讯作者:
    Tengyuan Liang
Estimating Certain Integral Probability Metrics (IPMs) Is as Hard as Estimating under the IPMs

Tengyuan Liang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

CAREER: New Frameworks for Ethical Statistical Learning: Algorithmic Fairness and Privacy
职业:道德统计学习的新框架:算法公平性和隐私
  • 批准号:
    2340241
  • 财政年份:
    2024
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Challenges in Statistical Genetics: Mendelian Randomization, Integrated Omics and General Methodology
职业:统计遗传学的新挑战:孟德尔随机化、综合组学和通用方法论
  • 批准号:
    2238656
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Statistical Approaches for Studying Evolutionary Processes: Inference, Attribution and Computation
职业:研究进化过程的新统计方法:推理、归因和计算
  • 批准号:
    2143242
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Towards a New Synthesis of Statistical Learning and Logical Reasoning
职业:迈向统计学习和逻辑推理的新综合
  • 批准号:
    1943641
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Statistical Methods for Classification and Analysis of High Dimensional and Functional Data
职业:高维和功能数据分类和分析的新统计方法
  • 批准号:
    1812354
  • 财政年份:
    2017
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: A New Neat Framework for Statistical Machine Learning
职业:统计机器学习的新简洁框架
  • 批准号:
    1661755
  • 财政年份:
    2016
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Techniques for Statistical Learning and Multivariate Analysis
职业:统计学习和多元分析新技术
  • 批准号:
    1554821
  • 财政年份:
    2016
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: A New Neat Framework for Statistical Machine Learning
职业:统计机器学习的新简洁框架
  • 批准号:
    1149803
  • 财政年份:
    2012
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Statistical Methods for Classification and Analysis of High Dimensional and Functional Data
职业:高维和功能数据分类和分析的新统计方法
  • 批准号:
    1055210
  • 财政年份:
    2011
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: A New Statistical Framework for Natural Images with Applications in Vision
职业:一种新的自然图像统计框架及其在视觉中的应用
  • 批准号:
    0953373
  • 财政年份:
    2010
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了