权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Implicit Bias and Low Complexity Networks (iLOCO)

隐式偏差和低复杂度网络 (iLOCO)

基本信息

批准号：
464121491
负责人：
Professor Dr. Massimo Fornasier
金额：
--
依托单位：
Mathematisches Institut
依托单位国家：
德国
项目类别：
Priority Programmes
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/464121491?language=en
关键词：
Implicit Bias Low Complexity Networks

项目摘要

Learned deep neural networks generalize very well despite being trained with a number of training samples that is significantly lower than the number of parameters. This surprising phenomenon goes against traditional wisdom which attributes overfitting with poor generalization. In such overparametrized setting the loss functional possesses many global minima corresponding to neural networks that interpolate the data, and the learning algorithm induces an implicit bias towards certain favored solutions. By the principle of Occam's razor, it can be anticipated that good generalization is connected with networks of low complexity and it seems that the standard algorithm of (stochastic) gradient descent favors networks whose complexity is much lower than suggested by the number of parameters. In this project we aim to advance recent first results, which show, for the simple case of deep linear networks, that training via gradient descent promotes implicit bias towards network weights whose product is a low rank matrix. We intend to significantly extend theory in this direction and exploit this mechanism for the reliable solution of low rank matrix recovery problems, such as matrix completion. We further aim at contributing to the theoretical foundations of the implicit bias of gradient descent and its stochastic variants for learning deep nonlinear networks. Evidence suggests that the bias is again towards low complexity networks, whose nature we intend to explore. We leverage the intrinsic low complexity of trained nonlinear networks to design novel algorithms for their compression. In particular, we aim at extending recent results to deep networks, which relate approximated second order network differentials to certain non-orthogonal rank one decompositions encoding optimal weights. We plan to prove that the optimal weights can be stably and reliably computed. As a byproduct we will show robust and unique identification of generic deep networks from a minimal number of samples. Besides advancing on the theoretical level, the project will develop new algorithms and software of practical relevance for machine learning, solution of inverse problems and compression of neural networks for their use on mobile devices.

学习的深度神经网络泛化能力非常好，尽管使用的训练样本数量明显低于参数数量。这一惊人的现象与传统的观点相悖，传统的观点认为过拟合具有很差的泛化能力。在这种过度参数化的设置中，损失函数具有许多全局最小值，对应于插值数据的神经网络，并且学习算法引起对某些有利解决方案的隐式偏差。根据奥卡姆剃刀原理，可以预期，良好的泛化与低复杂度的网络有关，并且（随机）梯度下降的标准算法似乎有利于复杂度远低于参数数量的网络。在这个项目中，我们的目标是推进最近的第一个结果，这些结果表明，对于深度线性网络的简单情况，通过梯度下降的训练会促进对网络权重的隐式偏差，其乘积是一个低秩矩阵。我们打算在这个方向上显着扩展理论，并利用这种机制来可靠地解决低秩矩阵恢复问题，例如矩阵完备化。我们的进一步目标是为学习深度非线性网络的梯度下降隐式偏差及其随机变体的理论基础做出贡献。有证据表明，这种偏见再次倾向于低复杂性网络，我们打算探索其性质。我们利用训练的非线性网络固有的低复杂性来设计新的压缩算法。特别是，我们的目标是将最近的结果扩展到深度网络，将近似的二阶网络微分与编码最佳权重的某些非正交秩一分解联系起来。我们计划证明，最佳的权重可以稳定和可靠的计算。作为一个副产品，我们将展示从最少数量的样本中对通用深度网络的鲁棒和唯一识别。除了在理论层面上取得进展外，该项目还将开发新的算法和软件，用于机器学习，逆问题的解决方案和神经网络的压缩，以便在移动的设备上使用。