权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Machine Learning through the Lenses of Optimal Transportation

从最佳交通的角度进行机器学习

基本信息

批准号：
2744976
负责人：
金额：
--
依托单位：
Durham University
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2744976
关键词：
Machine Learning through Lenses Optimal

项目摘要

This research project proposes to investigate some analytical and numerical aspects of the training of wide multi-layered neural networks using the modern theory of optimal transportation. Optimal mass transportation is a relatively new area in mathematics, but it has received a huge momentum in the past twenty years or so, culminating in the 2018 Fields medal of A. Figalli. This theory has had great impact and influence on fields as partial differential equations, probability theory and statistics, geometry and data science, with important applications in meteorology, economics, biology and social sciences, to list a few.Applied to machine learning, optimal mass transportation has had early successes, in the framework of supervised and unsupervised learning.The results have quickly found industrial applications, such as models based on variational auto-encoders and generative adversarial networks.The key objective in training a neural network is to identify a function that will accurately evaluate unseen data based on a given training set. Unfortunately, identifying this prediction function is itself a challenging optimisation problem, which can be solved only approximately and with many constraints through numerical methods. In the past, this has been achieved through techniques such as back propagation and stochastic gradient descent. However, such methods do not scale well as the number of neurons increases, making them infeasible for many practical applications.Thanks to recent developments in optimal transport, it is now possible to find prediction functions for infinitely wide single-layer neural networks.This is done by solving partial differential equations derived from so-called Wasserstein gradient flows, a key mechanism in optimal transport.Thus far, however, this mathematical framework is only applicable to single-layer neural networks, a key limitation acknowledged recently by Figalli et al.To address this open problem, we seek to develop, as the backbone of this project, a mathematical framework to use Wasserstein gradient flows to find prediction functions on neural networks with two or more layers.The tools of optimal transportation thus allow us to approximate the very-many-neuron problem, not by fewer neurons, but by a continuum (i.e. infinitely many) of them, reminiscent to the mean-field models in statistical physics. Our second objective is to quantify how good these approximations are, not only in the usual analytical sense of obtaining qualitative bounds that are often of little practical use, but also sharp bounds that align well with what is observed in applications. For this, we plan to use techniques from mean-field games (a new area initiated by another Fields medallist, P.-L. Lions, and his collaborators).Numerical computations will form an important part of this project, in the first instance to test our mathematical framework as it develops (not whether it is correct, but how directly applicable it is to real-life problems). Existing computational techniques, however, could only handle discrete (if many) neurons.Over the course of this project, as our third objective, we will need to develop numerical models, along with the requisite computational techniques, that couple discrete neurons with (an approximation of) a continuum of neurons. This will start with a single layer, progressing to multiple layers as we gain experience.

本研究计划将探讨利用现代最佳运输理论训练多层神经网路的分析与数值方法。最优质量运输是数学中一个相对较新的领域，但在过去二十年左右的时间里，它获得了巨大的动力，最终在2018年获得了菲尔兹奖。费加利这一理论对偏微分方程、概率论和统计、几何和数据科学等领域产生了巨大的影响和影响，在气象学、经济学、生物学和社会科学等领域都有重要的应用。应用于机器学习，在监督和无监督学习的框架下，最优公共交通已经取得了早期的成功。结果很快就找到了工业应用，例如基于变分自动编码器和生成对抗网络的模型。训练神经网络的关键目标是识别一个函数，该函数将基于给定的训练集准确地评估未见过的数据。不幸的是，识别这个预测函数本身就是一个具有挑战性的优化问题，它只能近似地通过数值方法解决，并且有许多约束。在过去，这已经通过诸如反向传播和随机梯度下降之类的技术来实现。然而，随着神经元数量的增加，这种方法不能很好地扩展，使得它们在许多实际应用中不可行。由于最佳传输的最新发展，现在可以找到无限宽单层神经网络的预测函数。这是通过求解由所谓的Wasserstein梯度流（最佳传输的关键机制）导出的偏微分方程来实现的。然而，到目前为止，这个数学框架只适用于单层神经网络，这是Figalli等人最近承认的一个关键限制。为了解决这个开放的问题，我们寻求开发，作为这个项目的骨干，一个数学框架，使用Wasserstein梯度流在两层或多层神经网络上找到预测函数。因此，最优传输工具允许我们近似非常多神经元的问题，不是通过更少的神经元，而是通过它们的连续体（即无限多个），让人想起统计物理学中的平均场模型。我们的第二个目标是量化这些近似值有多好，不仅是在通常的分析意义上获得的定性界限，往往是很少的实际用途，但也尖锐的界限，以及在应用中观察到的。为此，我们计划使用平均场比赛（由另一位菲尔兹奖得主P. L.数值计算将成为这个项目的重要组成部分，首先是测试我们的数学框架的发展（不是它是否正确，而是它在现实生活中的直接应用）。然而，现有的计算技术只能处理离散的（如果有很多）神经元。在这个项目的过程中，作为我们的第三个目标，我们将需要开发数值模型，沿着必要的计算技术，将离散神经元与连续的神经元（近似）耦合起来。这将从一个单一的层开始，随着我们获得经验，逐步发展到多个层。