权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scheduling method for data transfer of jobs with deadlines based on reinforcement learning

基于强化学习的有期限作业数据传输调度方法

基本信息

批准号：
22K12004
负责人：
栗本崇
金额：
$ 2.66万
依托单位：
National Institute of Informatics
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2022
资助国家：
日本
起止时间：
2022-04-01 至 2025-03-31
项目状态：
未结题

项目摘要

科学技術計算や大規模データセンタで注目されているデッドライン付きデータ転送ジョブのスケジューリングに強化学習を適用する手法について研究を進めている。従来の強化学習では主に環境が確定的に変化する状況を対象としていていたが、デッドライン付きデータ転送ジョブでは環境がランダムに変化する点が大きく異なる。そこで環境がランダムに変化する問題に強化学習を適用することが本研究の特徴である。本研究への取り組みにおいて、大きく２つの観点が挙げられる。第一点目は、強化学習において効果的に学習を可能とするための、学習エピソードの選択であり、第二点目は、適した強化学習方法を明らかにすることである。本年度は、主に第一点目に着目し研究を進めた。ランダムに到着するジョブパターンから学習エピソードの難易度を考慮しながら学習エピソードを選択し強化学習を行い、強化学習の結果に基づいてジョブスケジュールを行うことで、ジョブのデッドライン成功率が向上するかの評価を進めた。深層強化学習アルゴリズムとしては、PolicyGradient法を適用した。学習エピソードは、広く知られているEarly Deadline First(EDF)アルゴリズムでは理想的なスケジューリングができないパターンを、難易度を変えて複数選択し実験を行った。実験の結果、難易度が低いパターンについて、EDFに対して提案手法がより理想的なスケジューリングを行うこと結果を得た。一方、難易度が高い場合においては、EDFに対して提案手法がより理想的なスケジューリングを行う結果を得ることが出来なかった。そこでカリキュラムラーニング（難易度が低い学習エピソードにて学習し続けて難易度が高い学習エピソードを用いて学習を行う）を適用し学習を行った。本結果から、僅かながらにカリキュラムラーニングの効果が確認された。本研究結果を取りまとめ、電子情報通信学会CQ研究会（5月）にて報告を行う。

Science and Technology Computing Large-Scale Research and DevelopmentジョブのスケジューリングにReinforcement learning をapplied する technique について research を advance めている. Reinforcement learning is based on the environment and the determined environment. The situation is determined by the environment. The ッドラインpayきデータ転send the environment ジョブではがランダムに変化する点が大きくdifferentなる. This is a special environment problem that is suitable for reinforcement learning and this research is a special issue. In this study, we will use the みにおいて and the large きく２つの観点がげられる. The first point is that the effect of reinforcement learning is possible, and the effect of reinforcement learning is possible.ドの选択であり, 二点目は, suitable reinforcement learning method を明らかにすることである. This year, the main focus is on research, which is the first point of focus.ランダムに arrived at するジョブパターンからIt is difficult to learn エピソードのEase of consideration The results of reinforcement learning are based on the results of reinforcement learning. , ジョブのデッドラインsuccess rateが上するかのvaluation価を进めた. Deep reinforcement learning is applied, and the PolicyGradient method is applied. Learn エピソードは、広く知られているEarly Deadline First(EDF)アルゴリズムではIDEALなスケジューリングができないパターンを, difficulty level を変えてplural selection択し実験を行った.実験のRESULT, low difficulty level, いパターンについて, EDFに対してTI The method of the case is the ideal method and the result is the result. One side, high difficulty level, occasion, EDF proposal method, etc.りThe ideal なスケジューリングを行う results をget ることが come out なかった.そこでカリキュラムラーニング（Difficulty level is low いlearn エピソードにてlearn し続けてDifficulty levelがHighい学エピソードを用いて学を行う）をApplicableし学を行った. This result is only a confirmation of the effect of the かながらにカリキュラムラーニングの. The results of this study were taken from the report of the CQ Research Group of the Society of Electronic Information and Communications Technology (May).