权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Non-convex Optimization for Machine Learning: Theory and Methods

机器学习的非凸优化：理论与方法

基本信息

批准号：
RGPIN-2019-06167
负责人：
Erdogdu, Murat
金额：
$ 2.84万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=738754
关键词：
Non convex Optimization Machine Learning

项目摘要

Non-convex optimization has become an indispensable component of artificial intelligence due to the structural properties of popular machine learning models. Owing to their key role and empirical success in numerous learning tasks, they have been a major focus of recent optimization research. Many important characteristics of machine learning models, such as generalization and fast-trainability, are inherited from these optimization methods; thus, a good understanding of these algorithms are crucial. To this end, we use appropriate tools from statistics, diffusion theory, and differential geometry to explain the empirical success of popular non-convex methods. We further propose new paradigms for designing more efficient algorithms in this regime where scalability is a structural issue, yet can be resolved by appealing to non-convex methods. The main purpose of my research agenda is to improve our understanding on non-convex algorithms which have become the dominant optimization tools in machine learning. We further pursue several directions to build on our theoretical findings to design fast and efficient algorithms for practical problems. The overall research plan can be broken into three sections, to be pursued simultaneously: 1- Theoretical analysis of commonly used non-convex optimization algorithms, 2- Design of efficient optimization algorithms for machine learning, 3- Applying these methods to real problems. For example in a recent work, we established non-asymptotic analysis of discretized diffusions for non-convex optimization tasks. Our results provide explicit, finite-time convergence rates to global minima (item 1 above). Based on this, we show that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing literature (item 2 above). We complement these results by showing that diffusions designed for a specific objective function can attain better global convergence guarantees leading to problem-specific algorithm design (item 3 above). In this proposal, we focus on two popular non-convex methods in machine learning: 1- diffusion based and 2- matrix factorization based optimization. Early work on diffusion based non-convex optimization has focused on a specific diffusion named Langevin dynamics. Our work considers general Ito diffusions which provide us with various benefits including fast convergence, wide applicability, and better convergence properties. We further study widely used matrix factorization based non-convex methods, and establish their theoretical guarantees. For both of these directions, we build on our theory, and design efficient and scalable algorithms for various machine learning problems. Applications of these algorithms include recommender systems, inference in graphical models, neural networks etc.

由于流行的机器学习模型的结构特性，非凸优化已成为人工智能不可或缺的组成部分。由于它们在许多学习任务中的关键作用和经验成功，它们已成为最近优化研究的主要焦点。机器学习模型的许多重要特征，如泛化和快速可训练性，都继承自这些优化方法；因此，很好地理解这些算法是至关重要的。为此，我们使用统计学、扩散理论和微分几何中的适当工具来解释流行的非凸方法的经验成功。我们进一步提出了设计更高效算法的新范例，在这种情况下，可扩展性是一个结构性问题，但可以通过吸引非凸方法来解决。我的研究议程的主要目的是提高我们对非凸算法的理解，非凸算法已成为机器学习中占主导地位的优化工具。我们进一步追求几个方向，以建立我们的理论发现，为实际问题设计快速有效的算法。总体研究计划可分为三个部分，同时进行：1-常用非凸优化算法的理论分析，2-设计高效的机器学习优化算法，3-将这些方法应用于实际问题。例如，在最近的一项工作中，我们建立了非凸优化任务的离散扩散的非渐近分析。我们的结果提供了明确的、有限时间的全局最小值收敛率（上文第1项）。在此基础上，我们证明了不同的扩散适用于优化不同类型的凸函数和非凸函数。这使我们能够设计适合于全局优化凸函数和非凸函数的扩散，而现有文献没有涵盖（上文第2项）。我们通过证明为特定目标函数设计的扩散可以获得更好的全局收敛保证，从而实现针对特定问题的算法设计（上文第3项）来补充这些结果。在这个提议中，我们集中在机器学习中两种流行的非凸方法：基于1扩散和基于2矩阵分解的优化。基于扩散的非凸优化的早期工作集中在一种名为朗格万动力学的特定扩散上。我们的工作考虑了一般的Ito扩散，它为我们提供了各种好处，包括快速收敛，广泛的适用性和更好的收敛性。进一步研究了广泛应用的基于矩阵分解的非凸方法，并建立了它们的理论保证。对于这两个方向，我们都以我们的理论为基础，为各种机器学习问题设计了高效且可扩展的算法。这些算法的应用包括推荐系统、图形模型推理、神经网络等。