权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Hyper-fast hyper-parameter tuning for the next generation of machine learning

下一代机器学习的超快速超参数调整

基本信息

批准号：
RGPIN-2022-03669
负责人：
Schmidt, Mark
金额：
$ 4.01万
依托单位：
University of British Columbia
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=749904
关键词：
Hyper fast hyper parameter tuning

项目摘要

Machine learning (ML) uses large amounts data to set the parameters of a model, and is having success in a growing number of applications from speech recognition to computer vision to language translation. But ML has a problem with "hyper-parameters", the variables affecting the learning algorithm and the structure of the model. We typically still need to spend enormous amounts of time tuning hyper-parameters in order to obtain good performance. It was recently highlighted that the cost of tuning current ML language models has a comparable carbon output to five cars throughout their lifetime. This situation will only get worse as the next generation of models will have far more hyper-parameters. We will not be able to solve many important problems until we address this issue. The long-term goal of this research program is to develop algorithms that can train ML models on enormous datasets in a short amount of time. During the last 6 years we have focused on developing algorithms that converge faster, but we now need to turn to the pressing issue of dealing with the hyper-parameters. The short-term goal of this project is to focus on addressing the issues associated with the two typical sources of hyper-parameters: (A) The algorithm used to do the learning typically has hyper-parameters, such as the learning rate. (B) The model that is being used typically has hyper-parameters, such as the depth of a deep learning model. We have already made progress on (A). In 2019 we gave the first method that automatically tunes one of the most important hyper-parameters, the learning rate, during training. This method is guaranteed to perform at least as well as the best fixed learning rate for modern "over-parameterized" models. We plan to develop algorithms that are insensitive to other learning hyper-parameters and that do not require the over-parameterized assumption. My lab is uniquely positioned to address problem (B). Current strategies for tuning the parameters of network architectures tend to use discrete parameterizations of the hyper-parameters. In work 10 years ago I showed a variety of ways to use continuous parameterizations to yield high-quality approximate solutions to learning problems that involve searching over graph and hyper-graph structures. These continuous relaxations gave enormous speedups over previous approaches based on discrete parameterizations, and we will develop new methods like these to address tuning model hyper-parameters. There is a huge potential impact for the proposed research, with potential applications ranging from medicine to scientific discovery to self-driving cars. In ML we want to build algorithms and models that work across many applications (we want to "build a better hammer" that can be used for many tasks). Thus, breakthroughs on the algorithms underlying ML models immediately impact many applications that use ML (or will use it in the future).

机器学习使用大量的数据来设置模型的参数，并在从语音识别到计算机视觉到语言翻译的越来越多的应用中取得了成功。但ML存在“超参数”的问题，即影响学习算法和模型结构的变量。为了获得良好的性能，我们通常仍然需要花费大量时间来调优超参数。最近有人强调，调整当前ML语言模型的成本相当于五辆汽车在整个生命周期内的碳排放量。这种情况只会变得更糟，因为下一代模型将拥有更多的超参数。在我们解决这个问题之前，我们将无法解决许多重要问题。该研究计划的长期目标是开发能够在短时间内在海量数据集上训练ML模型的算法。在过去的6年里，我们一直专注于开发收敛更快的算法，但现在我们需要转向处理超参数的紧迫问题。本项目的短期目标是重点解决与两个典型的超参数来源相关的问题：(A)用于进行学习的算法通常具有超参数，如学习率。(B)正在使用的模型通常具有超参数，例如深度学习模型的深度。我们已经在(A)方面取得了进展。2019年，我们给出了第一种方法，在培训期间自动调整最重要的超参数之一-学习率。对于现代的“过参数”模型，该方法保证了至少和最佳固定学习速率一样的性能。我们计划开发对其他学习超参数不敏感的算法，并且不需要过度参数化的假设。我的实验室在解决问题(B)方面具有独特的优势。当前用于调整网络体系结构的参数的策略倾向于使用超参数的离散参数化。在10年前的工作中，我展示了使用连续参数化为学习问题生成高质量近似解决方案的各种方法，这些问题涉及在图和超图结构上进行搜索。这些连续的松弛使以前基于离散参数的方法有了很大的加速，我们将开发这样的新方法来解决模型超参数的调整问题。这项拟议的研究具有巨大的潜在影响，潜在的应用范围从医学到科学发现再到自动驾驶汽车。在ML中，我们想要构建跨许多应用程序工作的算法和模型(我们想要“构建一个更好的锤子”，可以用于许多任务)。因此，ML模型底层算法的突破立即影响到许多使用ML(或将在未来使用它)的应用程序。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Schmidt, Mark其他文献

Experimental quantification of the effect of Mg on calcite-aqueous fluid oxygen isotope fractionation

DOI：
10.1016/j.chemgeo.2012.03.027
发表时间：
2012-06-05
期刊：
CHEMICAL GEOLOGY
影响因子：
3.9
作者：
Mavromatis, Vasileios;Schmidt, Mark;Oelkers, Eric H.
通讯作者：
Oelkers, Eric H.

Convex Optimization for Big Data

DOI：
10.1109/msp.2014.2329397
发表时间：
2014-09-01
期刊：
IEEE SIGNAL PROCESSING MAGAZINE
影响因子：
14.9
作者：
Cevher, Volkan;Becker, Stephen;Schmidt, Mark
通讯作者：
Schmidt, Mark

A Portable and Autonomous Mass Spectrometric System for On-Site Environmental Gas Analysis

DOI：
10.1021/acs.est.6b03669
发表时间：
2016-12-20
期刊：
ENVIRONMENTAL SCIENCE & TECHNOLOGY
影响因子：
11.4
作者：
Brennwald, Matthias S.;Schmidt, Mark;Kipfer, Rolf
通讯作者：
Kipfer, Rolf

Dimensions in major depressive disorder and their relevance for treatment outcome.

DOI：
10.1016/j.jad.2013.10.020
发表时间：
2014-02
期刊：
JOURNAL OF AFFECTIVE DISORDERS
影响因子：
6.6
作者：
Vrieze, Elske;Demyttenaere, Koen;Bruffaerts, Ronny;Hermans, Dirk;Pizzagalli, Diego A.;Sienaert, Pascal;Hompes, Titia;de Boer, Peter;Schmidt, Mark;Claes, Stephan
通讯作者：
Claes, Stephan