权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CIF: Small: Towards Robust Statistical Learning: Theory and Algorithms

CIF：小：迈向稳健的统计学习：理论和算法

基本信息

批准号：
1908905
负责人：
Stanislav Minsker
金额：
$ 35.13万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2023-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1908905&HistoricalAwards=false
关键词：
CIF Small Towards Robust Statistical

项目摘要

Machine learning algorithms are used to automate various tasks by finding patterns in the existing data. The mathematical analysis of machine learning algorithms starts by assuming that the available dataset is described by a model with certain properties. However, as real-world data often do not satisfy the model assumptions exactly, there is a need to reduce the gap between the "mathematical" and "real" worlds by weakening the mathematical assumptions. The concept of robustness plays a central role in understanding this gap. First, the project will formulate principles for building robust algorithms. The project will then apply these principles to address problems related to the existence of mathematically justified and computationally efficient robust methods for prediction and classification tasks, which are among the most popular problems solved by machine learning algorithms. The project will also support undergraduate research by training students to apply advanced methods to the analysis of modern data sets. Additional efforts will be made to establish closer ties between the academic and industry machine learning research communities. One part of the project is devoted to robust empirical risk minimization. Empirical risk minimization is one of the fundamental concepts underlying modern mathematical statistics and statistical learning algorithms, including regression and maximum likelihood estimation. However, empirical risk minimization is not robust in many scenarios, with a single "atypical point" amongst the observations possibly significantly affecting performance. The work done in the course of this project will lead to algorithms that avoid explicit outlier detection and removal, and which instead take advantage of existing or purposefully induced symmetries in the distribution of the data. The analysis of these new algorithms will require the development of novel techniques related to Bahadur-type representations of robust estimators, and of new generalizations of the median-of-means principle. Another part of the project aims at developing robust modifications of the Federated Learning algorithm, originally designed as a communication-effective alternative to the standard centralized datacenter framework. The project will design new and robust versions of the Federated Learning algorithm that provably work in the challenging scenario where the input data have different distributions. Finally, the investigator will address inferential problems in robust learning by devising robust versions of posterior distributions that are central objects in Bayesian statistics; he will study the Bernstein-von Mises theorem for these robust posteriors, a fundamental result connecting the frequentist and Bayesian methods.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习算法用于通过在现有数据中查找模式来自动执行各种任务。机器学习算法的数学分析首先假设可用数据集由具有某些属性的模型描述。然而，由于真实世界的数据往往不完全满足模型假设，因此需要通过削弱数学假设来减少“数学”和“真实的”世界之间的差距。鲁棒性的概念在理解这一差距方面起着核心作用。首先，该项目将制定构建鲁棒算法的原则。然后，该项目将应用这些原则来解决与预测和分类任务的数学合理和计算高效的鲁棒方法的存在相关的问题，这些问题是机器学习算法解决的最受欢迎的问题之一。该项目还将通过培训学生应用先进方法分析现代数据集来支持本科生研究。将做出更多努力，在学术界和工业界机器学习研究社区之间建立更紧密的联系。该项目的一部分致力于稳健的经验风险最小化。经验风险最小化是现代数理统计和统计学习算法的基本概念之一，包括回归和最大似然估计。然而，经验风险最小化在许多情况下并不稳健，观察结果中的单个“非典型点”可能会显著影响性能。在这个项目的过程中所做的工作将导致算法，避免显式离群值检测和删除，而是利用现有的或有目的地诱导对称性的数据分布。这些新算法的分析将需要开发新的技术相关的Bahadur型表示的强大的估计，和新的推广的中位数的原则。该项目的另一部分旨在对联邦学习算法进行强大的修改，该算法最初被设计为标准集中式数据中心框架的通信有效替代方案。该项目将设计新的和强大的版本的联邦学习算法，可证明在具有挑战性的情况下，输入数据具有不同的分布。最后，研究人员将通过设计贝叶斯统计中的中心对象后验分布的鲁棒版本来解决鲁棒学习中的推理问题;他将研究这些鲁棒后验的伯恩斯坦-冯米塞斯定理，一个连接频率论和贝叶斯方法的基本结果。该奖项反映了NSF的法定使命，并被认为是值得通过使用基金会的智力价值进行评估来支持的和更广泛的影响审查标准。