权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

AF: RI: Medium: Collaborative Research: Understanding and Improving Optimization in Deep and Recurrent Networks

AF：RI：中：协作研究：理解和改进深度和循环网络的优化

基本信息

批准号：
1764032
负责人：
Nathan Srebro
金额：
$ 54.11万
依托单位：
Toyota Technological Institute at Chicago
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1764032&HistoricalAwards=false
关键词：
AF RI Medium Collaborative Research

项目摘要

Machine learning using deep neural networks has recently demonstrated broad empirical success. Despite this success, the optimization procedures that fit deep neural networks to data are still poorly understood. Besides playing a crucial role in fitting deep neural networks to data, optimization also strongly affects the model's ability to generalize from training examples to unseen data. This project will establish a working theory for why and when large artificial neural networks train and generalize well, and use this theory to develop new optimization methods. The utility of the new methods will be demonstrated in applications involving language, speech, biological sequences and other sequence data. The project will involve training of graduate and undergraduate students, and the project leaders will offer tutorials aimed at both the machine learning community, and other researchers and engineers using machine learning tools. In order to establish a theory of why and when non-convex optimization works well when training deep networks, both empirical top-down and analytic bottom-up approaches will be pursued. The top-down approach will involve phenomenological analysis of large scale deep models used in practice, both when presented with real data, and when presented with data specifically crafted to test the behavior of the network. The bottom-up approach will involve precise analytic investigation from increasingly more complex models, starting with linear models, and non-convex matrix factorization, progressing through linear neural networks, models with a small number of hidden layers, and eventually reaching deeper and more complex networks. The theory developed aims to be both explanatory and actionable, and will be used to derive new optimization methods and modifications to architectures that aid in optimization and generalization. A particularly important testbed is the case of recurrent neural networks. Recurrent neural networks are powerful sequence models that maintain state as they process an input sequence and are used for sequence data. Particularly challenging to optimize, recurrent neural networks still leave much room for a stronger principled understanding, which the project aims to provide.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

使用深度神经网络的机器学习最近取得了广泛的经验成功。尽管取得了这一成功，但将深度神经网络与数据相匹配的优化程序仍然知之甚少。除了在将深度神经网络与数据进行匹配方面发挥关键作用外，优化还强烈影响模型从训练样本到未见数据的泛化能力。这个项目将建立一个工作理论，为什么以及何时大型人工神经网络训练和推广良好，并利用这个理论来开发新的优化方法。新方法的实用性将在涉及语言、语音、生物序列和其他序列数据的应用中得到演示。该项目将包括对研究生和本科生的培训，项目负责人将提供针对机器学习社区以及其他使用机器学习工具的研究人员和工程师的教程。为了建立一种理论，解释为什么以及何时非凸优化在训练深度网络时效果良好，将采用经验自上而下和分析自下而上的方法。自上而下的方法将涉及对实践中使用的大规模深度模型进行现象学分析，无论是在提供真实数据时，还是在提供专门用于测试网络行为的数据时。自下而上的方法将涉及从越来越复杂的模型开始进行精确的分析调查，从线性模型开始，通过非凸矩阵因式分解，通过线性神经网络、具有少量隐藏层的模型，最终到达更深更复杂的网络。开发的理论旨在既具有解释性又具有可操作性，并将用于派生新的优化方法和对体系结构的修改，以帮助优化和推广。一个特别重要的试验台是递归神经网络的情况。递归神经网络是功能强大的序列模型，它们在处理输入序列时保持状态，并用于序列数据。特别具有挑战性的是，递归神经网络仍然为更强的原则性理解留下了很大的空间，这是该项目旨在提供的。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（33）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Characterizing Implicit Bias in Terms of Optimization Geometry

DOI：
发表时间：
2018-02
期刊：
ArXiv
影响因子：
0
作者：
Suriya Gunasekar;Jason D. Lee;Daniel Soudry;N. Srebro
通讯作者：
Suriya Gunasekar;Jason D. Lee;Daniel Soudry;N. Srebro

Pessimism for Offline Linear Contextual Bandits using Confidence Sets

使用置信集对离线线性上下文强盗的悲观态度

DOI：
发表时间：
2022
期刊：
Advances in neural information processing systems
影响因子：
0
作者：
Li, Gene;Ma, Cong;Srebro, Nati
通讯作者：
Srebro, Nati

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

DOI：
发表时间：
2021-02
期刊：
ArXiv
影响因子：
0
作者：
Shahar Azulay;E. Moroshko;M. S. Nacson;Blake E. Woodworth;N. Srebro;A. Globerson;Daniel Soudry
通讯作者：
Shahar Azulay;E. Moroshko;M. S. Nacson;Blake E. Woodworth;N. Srebro;A. Globerson;Daniel Soudry

Convergence of Gradient Descent on Separable Data