CAREER: Statistical Learning from a Modern Perspective: Over-parameterization, Regularization, and Generalization

职业:现代视角下的统计学习:过度参数化、正则化和泛化

基本信息

  • 批准号:
    2143215
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-09-01 至 2027-08-31
  • 项目状态:
    未结题

项目摘要

Statistical methods have been a major driving force towards interpretable, actionable, and trustworthy machine learning. However, the existing statistical theory remains highly inadequate in explaining many new phenomena that emerge, and become pervasive in modern machine learning applications. For instance, the prevalence of over-parameterized models (i.e., the ones that have more model parameters than samples) challenges our classical statistical insights about the bias-variance tradeoff; the fact that many learning algorithms exhibit favorable algorithmic regularization to alleviate overfitting is largely beyond the reach of previous statistical literature, and the unconventional shapes of the risk curves in modern applications puzzle many statisticians. Compared to the rich theory developed for classical settings, however, the statistical underpinnings for these curious yet mysterious phenomena remain far from sufficient. Motivated by this, the overarching goal of the project is to enrich the statistical foundation of machine learning by adapting it to contemporary settings, thereby bridging classical statistics and cutting-edge machine learning. In addition, the project will provide valuable opportunities for training students (particularly underrepresented groups) at all levels across multiple disciplines in the STEM field, and will exert scientific and societal impacts on several domains beyond the tasks described herein, including but not limited to neuroscience, online education, and equitable machine learning.Striving for interpretability and actionable insights, this project plans to revisit multiple classical statistical problems---ranging from minimum-norm interpolation, risk estimation, cross validation, kernel boosting, data-imbalanced classification, to transfer learning---with an emphasis on unveiling new insights for modern yet under-explored regimes. Several recurring themes include: (i) characterizing precise risk behavior in the face of large model complexity; (ii) reconciling the seemingly conflicting goals of over-parameterization and regularization; (iii) developing algorithm-specific statistical reasoning tools; and (iv) exploring the interplay between regularization and generalization. The project comprises three distinct yet related thrusts: (1) statistical insights for over-parameterization: which explores the prolific interplay between model complexity and out-of-sample performance; (2) algorithmic regularization via early stopping: which aims to develop statistical principles that underlie early stopping; (3) risk (non)-monotonicity with imbalanced data: which is motivated by the non-monotonicity of generalization errors in the sample size and pursues principled debiasing methods to rectify it. The project will develop a suite of statistical insights that can inform cutting-edge machine learning practice, as well as an array of statistical methodologies that will be practically appealing for modern data-driven applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
统计方法一直是实现可解释、可操作和可信赖的机器学习的主要推动力。然而,现有的统计理论仍然远远不足以解释许多新现象的出现,并在现代机器学习应用中变得普遍。例如,过度参数化模型(即模型参数多于样本的模型)的流行挑战了我们关于偏差-方差权衡的经典统计见解;事实上,许多学习算法表现出良好的算法正则化来缓解过拟合,这在很大程度上超出了以前的统计文献的范围,而现代应用中风险曲线的非常规形状令许多统计学家感到困惑。然而,与经典背景下发展起来的丰富理论相比,这些奇怪而神秘的现象的统计基础还远远不够。受此启发,该项目的总体目标是通过使机器学习适应当代环境来丰富机器学习的统计基础,从而将经典统计学与尖端机器学习联系起来。此外,该项目将为培训STEM领域多个学科的各级学生(特别是代表性不足的群体)提供宝贵的机会,并将在本文描述的任务之外的几个领域产生科学和社会影响,包括但不限于神经科学、在线教育和公平的机器学习。为了获得可解释性和可操作的见解,该项目计划重新审视多个经典统计问题——从最小范数插值、风险估计、交叉验证、核增强、数据不平衡分类到迁移学习——重点是为现代尚未充分探索的制度揭示新的见解。几个反复出现的主题包括:(i)在面对大型模型复杂性时精确描述风险行为;调和过度参数化和正则化这两个看似矛盾的目标;(iii)开发特定算法的统计推理工具;(iv)探索正则化和泛化之间的相互作用。该项目包括三个不同但相关的重点:(1)过度参数化的统计见解:探索模型复杂性和样本外性能之间的丰富相互作用;(2)通过早期停止的算法正则化:旨在制定早期停止的统计原理;(3)不平衡数据的风险(非)单调性:由样本量泛化误差的非单调性驱动,并采用原则性的去偏方法来纠正它。该项目将开发一套统计见解,可以为尖端的机器学习实践提供信息,以及一系列统计方法,这些方法将对现代数据驱动的应用程序具有实际吸引力。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Lasso with general Gaussian designs with applications to hypothesis testing
  • DOI:
    10.1214/23-aos2327
  • 发表时间:
    2020-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michael Celentano;A. Montanari;Yuting Wei
  • 通讯作者:
    Michael Celentano;A. Montanari;Yuting Wei
Softmax policy gradient methods can take exponential time to converge
  • DOI:
    10.1007/s10107-022-01920-6
  • 发表时间:
    2021-02
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Gen Li;Yuting Wei;Yuejie Chi;Yuantao Gu;Yuxin Chen
  • 通讯作者:
    Gen Li;Yuting Wei;Yuejie Chi;Yuantao Gu;Yuxin Chen
Derandomizing Knockoffs
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yuting Wei其他文献

Advances in chondroitinase delivery for spinal cord repair.
软骨素酶递送用于脊髓修复的进展。
  • DOI:
    10.31083/j.jin2104118
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Yuting Wei;Melissa R. Andrews
  • 通讯作者:
    Melissa R. Andrews
Improved design method for line gear pair based on screw theory
基于螺旋理论的线齿轮副改进设计方法
From Gauss to Kolmogorov: Localized Measures of Complexity for Ellipses
从高斯到柯尔莫哥洛夫:椭圆复杂性的局部度量
The promoting effects of pyriproxyfen on autophagy and apoptosis in silk glands of non-target insect silkworm, emBombyx mori/em
吡丙醚对非靶标昆虫家蚕丝腺自噬和凋亡的促进作用
  • DOI:
    10.1016/j.pestbp.2023.105586
  • 发表时间:
    2023-11-01
  • 期刊:
  • 影响因子:
    4.000
  • 作者:
    Guoli Li;Yizhe Li;Chunhui He;Yuting Wei;Kunpei Cai;Qingyu Lu;Xuebin Liu;Yizhou Zhu;Kaizun Xu
  • 通讯作者:
    Kaizun Xu
Measurement of the half-life of 95mTc and the 96Ru (n, x) 95mTc reaction cross-section induced by D–T neutron with covariance analysis
  • DOI:
    10.1140/epja/s10050-022-00879-4
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
  • 作者:
    Yuting Wei;Changlin Lan;Yujie Ge;Xianlin Yang;Liyang Jiang;Yangbo Nie;Xiaojun Li;Jiahao Wang;Gong Jiang;Xichao Ruan;Xiaolong Huang;Xiaodong Pan
  • 通讯作者:
    Xiaodong Pan

Yuting Wei的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yuting Wei', 18)}}的其他基金

Collaborative Research: Fine-Grained Statistical Inference in High Dimension: Actionable Information, Bias Reduction, and Optimality
协作研究:高维细粒度统计推断:可操作信息、减少偏差和最优性
  • 批准号:
    2147546
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
Collaborative Research: Fine-Grained Statistical Inference in High Dimension: Actionable Information, Bias Reduction, and Optimality
协作研究:高维细粒度统计推断:可操作信息、减少偏差和最优性
  • 批准号:
    2015447
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant

相似海外基金

CAREER: New Frameworks for Ethical Statistical Learning: Algorithmic Fairness and Privacy
职业:道德统计学习的新框架:算法公平性和隐私
  • 批准号:
    2340241
  • 财政年份:
    2024
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Statistical Learning with Recursive Partitioning: Algorithms, Accuracy, and Applications
职业:递归分区的统计学习:算法、准确性和应用
  • 批准号:
    2239448
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Domain-aware Statistical Learning
职业:领域感知统计学习
  • 批准号:
    2143695
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CAREER: Understanding metal/support interactions in catalysis with statistical learning
职业:通过统计学习了解催化中金属/载体的相互作用
  • 批准号:
    2143941
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Federated Learning: Statistical Optimality and Provable Security
职业:联邦学习:统计最优性和可证明的安全性
  • 批准号:
    2144593
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Designing Meaningful Learning Experiences for Statistical Literacy in Secondary Mathematics
职业:为中学数学中的统计素养设计有意义的学习体验
  • 批准号:
    2143816
  • 财政年份:
    2022
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Fast and Accurate Statistical Learning and Inference from Large-Scale Data: Theory, Methods, and Algorithms
职业:从大规模数据中快速准确地进行统计学习和推理:理论、方法和算法
  • 批准号:
    2046874
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Statistical Paradigms Reconciling Empirical Surprises in Modern Machine Learning
职业:新的统计范式调和现代机器学习中的经验惊喜
  • 批准号:
    2042473
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Nonconvex Optimization for Statistical Estimation and Learning: Conditioning, Dynamics, and Nonsmoothness
职业:统计估计和学习的非凸优化:条件、动力学和非平滑性
  • 批准号:
    2047637
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Smooth statistical distances for a scalable learning theory
职业:可扩展学习理论的平滑统计距离
  • 批准号:
    2046018
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了