权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Developing Efficient Algorithms based on Smoothing Loss Function for Large Batch Training

开发基于平滑损失函数的高效算法以进行大批量训练

基本信息

批准号：
20J13997
负责人：
長沼大樹
金额：
$ 1.22万
依托单位：
Tokyo Institute of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2020
资助国家：
日本
起止时间：
2020-04-24 至 2021-03-31
项目状态：
已结题

项目摘要

近年の深層学習における学習時間は、非常に長い時間を要しているため、大規模並列化によって学習時間を短縮するのが喫緊の課題である。しかしながら、大規模並列化を用いて一回の入力データ量であるミニバッチサイズを大きくしたラージバッチでの学習は、スモールバッチでの学習に比べ、汎化性能が劣化する問題が知られている。実は大規模並列学習、いわゆるラージバッチ学習の問題はさらに大きく２つの問題に分解でき、一つは損失関数の形状に帰結すると考えられる問題で、もう一方は高速な収束性を求められるという問題である。本研究では上記２つの問題を解決するため、最適化手法及び適応的学習率法が収束性と汎化性能へ及ぼす影響の解析をして下記２つの結果を得た。これらの結果から、ラージバッチ学習において最適な最適化手法を設計でき、ラージバッチ学習の問題点を改善できると考えられる。まず、損失関数の形状と汎化性能について、最適化に適当なノイズとして解釈される適応的学習率法を用いた場合、巨大なバッチサイズにおいても高い汎化性能を達成する。また、バッチサイズを大きくするほど、これらの暗黙的な正則化と解釈されるノイズの効果が有効であり損失関数の形状を改善することが確認された。次に収束性に関して、Practicalな実験設定においても二次の最適化手法及びその近似手法は一次の最適化手法より高い収束性を示すことを示した。また、適応的学習率法においてもレイヤー毎の学習速度を調整することで限られた反復回数においても収束できることを示しただけでなく、より巨大なバッチサイズにおいても反復回数削減効果が得られることを示した。

In recent years, we have learned a lot about the learning time, the very long time, the large-scale model, and the list of short-term learning time courses. In the first place, we need to know how to improve the performance and performance of the system. This is an example of how to solve large-scale problems, such as problems, problems and problems. The purpose of this study is to solve the two problems in this study, including the method of optimization, the method of optimization, the method of learning rate, and the method of analyzing the results of two experiments. The results of the experiment, the design of the most advanced chemical techniques, the design of the most advanced chemical techniques, and the improvement of the performance points of the experimental system are very important. The number of errors in the shape, the performance, the number of shapes, the number of shapes, the There are some problems in the shape of the improvement of the shape of the number of missing parts. Secondary bundles, Practical settings, secondary maximization and approximate manipulation, one-time maximization, high-bundles, and high-bundles. The method of learning speed, speed and speed.

项目成果

期刊论文数量（6）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Towards Understanding the relationship of Batch Size and Iterations in Deep Learning

理解深度学习中批量大小和迭代的关系

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Hiroki Naganuma;Rio Yokota
通讯作者：
Rio Yokota

Stochastic Weight Averaging (SWA) のハイパーパラメータの影響に関する実験的解析

超参数对随机权重平均（SWA）影响的实验分析

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Kohei Kawaguchi;Rie Takei-Hoshi;Ikue Yoshikawa;Keiji Nishida;Makoto Kobayashi;Miyako Kusano;Yu Lu;Tohru Ariizumi;Hiroshi Ezura;Shungo Otagaki;Shogo Matsumoto;Katsuhiro Shiratake;所畑貴大; 長沼大樹; 横田理央
通讯作者：
所畑貴大; 長沼大樹; 横田理央

深層学習における学習過程の汎化指標解析とハイパーパラメータ最適化への応用

深度学习学习过程泛化指标分析及其在超参数优化中的应用

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Otsuka Shotaro;Sakakima Harutoshi;Tani Akira;Nakanishi Kazuki;Takada Seiya;Norimatsu Kosuke;Maejima Hiroshi;Maruyama Ikuro;長沼大樹; 野村将寛; 横田理央
通讯作者：
長沼大樹; 野村将寛; 横田理央

Mila/University of Montreal(カナダ)

米拉/蒙特利尔大学（加拿大）