权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Iterative Algorithms for Statistics: From Convergence Rates to Statistical Accuracy

统计迭代算法：从收敛率到统计准确性

基本信息

批准号：
2301050
负责人：
Martin Wainwright
金额：
$ 30万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2023-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2301050&HistoricalAwards=false
关键词：
Iterative Algorithms Statistics Convergence Rates

项目摘要

Science, engineering, and industry are all being revolutionized by the modern era of data science, in which increasingly large and rich forms of data are now available. The applications are diverse and broadly significant, including data-driven discovery in astronomy, statistical machine learning approaches to drug design, and decision-making in robotics and automated driving, among many others. This grant supports research on techniques and models for learning from such massive datasets, leading to computationally efficient algorithms that can be scaled to the large problem instances encountered in practice. The PI plans to integrate research and education through the involvement of graduate students in the research, the inclusion of the research results in courses at UC Berkeley and in publicly available web-based course materials, as well as in mini courses at summer schools and workshops. This project will also provide mentoring and support for graduate students and postdocs who are female or belong to URM communities.Many estimates in statistics are defined via an iterative algorithm applied to a data-dependent objective function (e.g., the EM algorithm for missing data and latent variable models; gradient-based methods and Newton's method for M-estimation; boosting algorithms used in non-parametric regression). This projectl gives several research thrusts that are centered around exploiting the dynamics of these algorithms in order to answer statistical questions, with applications to statistical parameter estimation; selection of the number of components in a mixture model; and optimal bias-variance trade-offs in non-parametric regression. In more detail, the aims of this project include (i) providing a general analysis of the EM algorithm for non-regular mixture models and related singular problems, in which very slow (sub-geometric) convergence is typically observed; (ii) developing a principled method for model selection based on the convergence rate of EM, and to prove theoretical guarantees on its performance; developing a general theoretical framework for combining the convergence rate of an algorithm with bounds on its (in)stability so as to establish bounds on the statistical estimation error; and (iii) providing a complete analysis of the full boosting path for various types of boosting updates, including kernel boosting, as well as gradient-boosted regression trees, and to analyze the "overfitting" regime, elucidating conditions under which overfitting does or does not occur.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

科学、工程和工业都在数据科学的现代时代发生革命性的变化，在这个时代，越来越多的数据形式变得越来越大和丰富。这些应用是多样的，具有广泛的意义，包括天文学中的数据驱动发现，药物设计的统计机器学习方法，以及机器人和自动驾驶的决策等。这项资助支持对从如此庞大的数据集中学习的技术和模型的研究，从而产生计算效率高的算法，这些算法可以扩展到实践中遇到的大型问题实例。PI计划通过研究生参与研究，将研究成果纳入加州大学伯克利分校的课程和公开的基于网络的课程材料，以及暑期学校和研讨会的迷你课程，来整合研究和教育。该项目还将为女性或属于URM社区的研究生和博士后提供指导和支持。统计学中的许多估计是通过应用于数据依赖目标函数的迭代算法定义的（例如，用于缺失数据和潜在变量模型的EM算法;用于M估计的基于梯度的方法和牛顿法;用于非参数回归的boosting算法）。该项目给出了几个研究重点，这些研究重点围绕利用这些算法的动态来回答统计问题，并应用于统计参数估计;混合模型中组件数量的选择;以及非参数回归中的最佳偏差-方差权衡。更详细地说，这个项目的目的包括：（i）提供一个一般的分析EM算法的非正则混合模型和相关的奇异问题，其中非常缓慢（亚几何）收敛通常观察;（ii）开发一个原则性的方法，模型选择的基础上EM的收敛速度，并证明其性能的理论保证;开发一个通用的理论框架，用于将算法的收敛速度与其（不）稳定性的界限相结合，以便建立统计估计误差的界限;以及（iii）为各种类型的提升更新提供完整的提升路径的完整分析，包括内核提升，以及梯度提升回归树，该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。