权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Optimization methods for deep learning: training and testing in machine learning

深度学习的优化方法：机器学习中的训练和测试

基本信息

批准号：
2282418
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2282418
关键词：
Optimization methods deep learning training

项目摘要

This project falls within the EPSRC numerical analysis research areas.This project is undertaken with Industrial Partner NAG.Optimization problems form the modelling and numerical core of machine learning and statistical methodologies, such as in the training of supervised machine learning classifiers. In such applications, a large amount of data (feature vectors) is available that has been already classified (namely labelled). Then a parameterised classifier is selected and trained on this data, namely, values of the parameters are calculated so that the output of the classifier at the given features matches their labels in some optimal way. The ensuing classifier is then used for testing, in order to label/classify unseen data. The training problem is formulated as an optimization problem that minimises, for example, the average amount of errors that the classifier makes on the test set. Various formulations of the optimization problem are used, most commonly considering some continuous loss function to measure the error at each data point and either (deterministic) finite sum or (probabilistic) expectation of the loss terms as the total error; the latter may be convex (such as in the case of binary classification) but, more and more nowadays, it may instead be nonconvex due to the prevalence of deep learning applications. The scale of the ensuing optimization is commonly huge, with millions of parameters and terms in the objective sum of functions. This makes the calculation of a single function or gradient value prohibitively expensive and leads to the need for inexact optimization algorithms that effectively exploit problem structure. The practical method of choice in ML applications is the (batch) stochastic gradient method (Robbins-Munro, 1950) that computes only the gradients of a small, randomly chosen number of the loss terms and controls both variance and convergence by means of a predefined stepsize. A grand challenge in this area is how to augment stochastic gradient methods with inexact second order derivative information, so as to obtain more efficient methods especially in the nonconvex case of deep learning, both in terms of achieving higher accuracy but also robustness to ill-conditioning. In this project, we will investigate ways to approximate second-order information in the finite-sum structure of ML optimization problems, from subsampling second-order derivatives to approximating them by differences in batch gradients, such as in block (stochastic) quasi-Newton approaches and Gauss-Newton methods. In place of the usual predefined stepsize, we will consider the impact of more sophisticated stepsize techniques from classical optimization that are adaptive to local optimization landscapes such as variable/adaptive trust-region radius, regularization and linesearch. We will also consider preconditioning techniques that contain inexpensive second-derivative information so as to help the performance of first order methods. We will also investigate parallel and decentralised implementations of these methods, which is a challenge especially for higher-order techniques.Potential outcomes:(i) State-of-the-art deep learning and optimization formulations and methods for machine learning.(ii) Novel optimization methods that use inexact (deterministic/stochastic) problem information.(iii) Evaluation of methods for deep neural net training and testing.

该项目属于EPSRC数值分析研究领域。该项目由工业合作伙伴NAG承担。优化问题构成了机器学习和统计方法的建模和数值核心，例如在监督机器学习分类器的训练中。在这些应用中，有大量已经分类（即标记）的数据（特征向量）可用。然后选择一个参数化分类器并在此数据上进行训练，即计算参数的值，使分类器在给定特征处的输出以某种最优方式与它们的标签匹配。随后的分类器用于测试，以便标记/分类未见过的数据。训练问题被表述为最小化的优化问题，例如，最小化分类器在测试集上产生的平均错误量。使用了各种优化问题的公式，最常见的是考虑一些连续损失函数来测量每个数据点的误差，并将损失项的（确定性）有限和或（概率）期望作为总误差；后者可能是凸的（例如在二元分类的情况下），但由于深度学习应用的普及，它可能越来越多地是非凸的。随后的优化规模通常是巨大的，在函数的目标和中有数百万个参数和项。这使得单个函数或梯度值的计算成本过高，并导致需要有效利用问题结构的不精确优化算法。机器学习应用中的实际选择方法是（批）随机梯度方法（Robbins-Munro, 1950），该方法仅计算少量随机选择的损失项的梯度，并通过预定义的步长控制方差和收敛。该领域的一个重大挑战是如何用不精确的二阶导数信息增强随机梯度方法，从而获得更有效的方法，特别是在深度学习的非凸情况下，既要达到更高的精度，又要具有对病态的鲁棒性。在这个项目中，我们将研究在ML优化问题的有限和结构中近似二阶信息的方法，从次抽样二阶导数到通过批梯度的差异来近似它们，例如块（随机）准牛顿方法和高斯-牛顿方法。代替通常预定义的步长，我们将考虑来自经典优化的更复杂的步长技术的影响，这些技术可适应局部优化景观，如可变/自适应信任域半径、正则化和线研究。我们还将考虑包含廉价二阶导数信息的预处理技术，以帮助一阶方法的性能。我们还将研究这些方法的并行和分散实现，这对高阶技术来说是一个挑战。潜在成果：(i)最先进的深度学习和优化机器学习的公式和方法。（ii）使用不精确（确定性/随机）问题信息的新颖优化方法。（三）评价深度神经网络训练和测试方法。