权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: New Statistical Paradigms Reconciling Empirical Surprises in Modern Machine Learning

职业：新的统计范式调和现代机器学习中的经验惊喜

基本信息

批准号：
2042473
负责人：
Tengyuan Liang
金额：
$ 40万
依托单位：
University of Chicago
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-07-01 至 2026-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2042473&HistoricalAwards=false
关键词：
CAREER New Statistical Paradigms Reconciling

项目摘要

Exciting empirical breakthroughs have emerged in data science and engineering through combination of large-scale datasets, increasingly complex statistical models, and advanced computational power. The success also promises new directions in statistics and econometrics, among other scientific disciplines. Nevertheless, the empirical phenomena exhibited by modern Machine Learning (ML) challenge the core mathematical concepts in statistics and computation: (a) Why can complex over-parametrized models enjoy excellent statistical performances even with interpolating the training examples? (b) Why can seemingly simple stochastic optimization methods optimize such complex models effectively? (c) What kinds of structures or representations of data are responsible for modern ML models’ efficacy over classical statistical models when the dimension becomes moderately large? This project aims to develop new statistical and computational paradigms that bridge the gap between theory and practice for learning from data. The project will also significantly impact undergraduate and graduate students’ training in data science research through synergetic educational and research activities to be hosted under a new initiative that integrates and enhances resources across the fields of statistics and economics.The project will investigate the role of regularization, statistical performance, and optimization algorithms in modern ML models, including kernel machines, boosting, random forests, and neural networks. In particular, the PI will focus on the following three modules. (a) Learning functions in the interpolation/overfitting regime: The PI will study the statistical performance of minimum-norm interpolated solutions, which fall beyond the realm of the classical empirical risk minimization analysis. The PI also plans to develop a rigorous mathematical framework to quantify the adaptive representation aspects of specific ML models. (b) Learning distributions with generative models and simulation-based inference: The PI will investigate the statistical foundations of generative models for learning implicit probability distributions and study new simulation-based inference procedures. (c) Optimization algorithms motivated by stochastic approximation and online learning: The PI will study the interplay between optimization and statistical performance of gradient-based stochastic approximation methods for learning complex ML models with non-convex landscapes. The research intends to challenge conventional wisdom in statistics and computation, modernize nonparametric statistics and learning theory education, and further shed light on devising the next generation nonparametric models with algorithms and computation in mind.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

通过大规模数据集、日益复杂的统计模型和先进的计算能力的结合，数据科学和工程领域出现了令人兴奋的经验突破。这一成功也预示着统计学和计量经济学以及其他科学学科的新方向。然而，现代机器学习（ML）所表现出的经验现象挑战了统计和计算中的核心数学概念：（a）为什么复杂的过参数化模型即使在插值训练样本的情况下也能获得出色的统计性能？(b)为什么看似简单的随机优化方法可以有效地优化如此复杂的模型？(c)当维度变得适度大时，什么样的数据结构或表示方式负责现代ML模型对经典统计模型的有效性？该项目旨在开发新的统计和计算范例，弥合理论与实践之间的差距，从数据中学习。该项目还将通过整合和增强统计学和经济学领域资源的新举措举办协同教育和研究活动，对本科生和研究生的数据科学研究培训产生重大影响。该项目将研究正则化，统计性能和优化算法在现代ML模型中的作用，包括内核机器，boosting，随机森林，和神经网络。具体而言，PI将侧重于以下三个模块。(a)插值/过拟合机制中的学习函数：PI将研究最小范数插值解的统计性能，这超出了经典经验风险最小化分析的范围。PI还计划开发一个严格的数学框架来量化特定ML模型的自适应表示方面。(b)使用生成模型和基于模拟的推理学习分布：PI将研究用于学习隐式概率分布的生成模型的统计基础，并研究新的基于模拟的推理程序。(c)随机近似和在线学习激励的优化算法：PI将研究基于梯度的随机近似方法的优化和统计性能之间的相互作用，用于学习具有非凸景观的复杂ML模型。该研究旨在挑战统计和计算的传统智慧，实现非参数统计和学习理论教育的现代化，并进一步阐明如何设计下一代非参数模型，同时考虑到算法和计算。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。