BIGDATA: Collaborative Research: F: Stochastic Approximation for Subspace and Multiview Representation Learning

BIGDATA：协作研究：F：子空间和多视图表示学习的随机逼近

基本信息

批准号：
1546500
负责人：
Nathan Srebro
金额：
$ 39.45万
依托单位：
Toyota Technological Institute at Chicago
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-09-01 至 2021-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546500&HistoricalAwards=false
关键词：
BIGDATA Collaborative Research Stochastic Approximation

项目摘要

Unsupervised learning of useful features, or representations, is one of the most basic challenges of machine learning. Unsupervised representation learning techniques capitalize on unlabeled data which is often cheap and abundant and sometimes virtually unlimited. The goal of these ubiquitous techniques is to learn a representation that reveals intrinsic low-dimensional structure in data, disentangles underlying factors of variation by incorporating universal AI priors such as smoothness and sparsity, and is useful across multiple tasks and domains. This project aims to develop new theory and methods for representation learning that can easily scale to large datasets. In particular, this project is concerned with methods for large-scale unsupervised feature learning, including Principal Component Analysis (PCA) and Partial Least Squares (PLS). To capitalize on massive amounts of unlabeled data, this project will develop appropriate computational approaches and study them in the ?data laden? regime. Therefore, instead of viewing representation learning as dimensionality reduction techniques and focusing on an empirical objective on finite data, these methods are studied with the goal of optimizing a population objective based on sample. This view suggests using Stochastic Approximation approaches, such as Stochastic Gradient Descent (SGD) and Stochastic Mirror Descent, that are incremental in nature and process each new sample with a computationally cheap update. Furthermore, this view enables a rigorous analysis of benefits of stochastic approximation algorithms over traditional finite-data methods. The project aims to develop stochastic approximation approaches to PCA and PLS and related problems and extensions, including deep, and sparse variants, and analyze these problems in the data-laden regime.

无监督的学习有用功能或表示形式是机器学习的最基本挑战之一。无监督的表示学习技术利用了通常便宜，丰富，有时几乎无限的未标记数据。这些无处不在的技术的目的是学习一种表示形式，该表示揭示了数据中固有的低维结构，通过合并通用AI先验（例如平滑度和稀疏性），在变化的基本因素中，并且在多个任务和域之间都是有用的。该项目旨在开发可以轻松扩展到大型数据集的代表学习的新理论和方法。特别是，该项目与大规模无监督特征学习的方法有关，包括主成分分析（PCA）和部分最小二乘（PLS）。为了利用大量未标记的数据，该项目将开发适当的计算方法并在“数据”中研究它们？政权。因此，研究这些方法的目的是旨在根据样本优化人口目标，而不是将其视为降低维度降低技术并专注于有限数据的经验目标。该观点表明，使用随机近似方法，例如随机梯度下降（SGD）和随机镜下降，它们本质上是增量的，并且可以使用计算廉价更新来处理每个新样本。此外，此观点可以严格分析与传统有限数据方法相对于传统近似方法的好处。该项目旨在为PCA和PLS以及相关的问题和扩展（包括深层和稀疏变体）开发随机近似方法，并在数据充满数据方面分析这些问题。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Efficiently Learning Adversarially Robust Halfspaces with Noise

DOI：
发表时间：
2020-05
期刊：
ArXiv
影响因子：
0
作者：
Omar Montasser;Surbhi Goel;Ilias Diakonikolas;N. Srebro
通讯作者：
Omar Montasser;Surbhi Goel;Ilias Diakonikolas;N. Srebro

Guaranteed validity for empirical approaches to adaptive data analysis

保证自适应数据分析经验方法的有效性

DOI：
发表时间：
2020
期刊：
International Conference on Artificial Intelligence and Statistics
影响因子：
0
作者：
Rogers Ryan;Roth Aaron;Smith Adam, Srebro Nathan;Thakkar Om, Woodworth Blake
通讯作者：
Thakkar Om, Woodworth Blake

Does invariant risk minimization capture invariance?

不变风险最小化是否捕获了不变性？

DOI：
发表时间：
2021
期刊：
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
影响因子：
0
作者：
Kamath Pritish, Tangella Akilesh
通讯作者：
Kamath Pritish, Tangella Akilesh

Efficient coordinate-wise leading eigenvector computation