权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of Asynchronous Distributed Multi-module Deep Reinforcement Learning Focusing on Different Control Periods

关注不同控制周期的异步分布式多模块深度强化学习的发展

基本信息

批准号：
21H03527
负责人：
内部英治
金额：
$ 10.73万
依托单位：
Advanced Telecommunications Research Institute International
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2021
资助国家：
日本
起止时间：
2021-04-01 至 2025-03-31
项目状态：
未结题

项目摘要

本年度は主にモデルフリー強化学習とモデルベース強化学習を協調学習させるための基準について調査した。これまでは各学習器の価値関数の大小に応じて確率的に学習器を選択する「価値関数に基づく方法」のみを用いてきた。本年度はそれに加えて、「報酬予測誤差に基づく方法」、「状態予測誤差に基づく方法」、およびそれらの重みづけで表現された選択強度を学習する「学習に基づく方法」を実装し、比較検討した。モデルフリー強化学習はDeep Deterministic Policy Gradientを、モデルベース強化学習はStochastic Value Gradientを、「学習に基づく方法」ではREINFORCEをアルゴリズムとして採用した。また評価方法としてはOpenAI Gymで提供されるFetchReach、FeatchSlide、FetchPickAndPlaceを用いた。もっとも簡単なFetchReach課題では「価値関数に基づく方法」と「学習に基づく方法」は学習が進むにつれてモデルフリーを選択する確率が増大し、「状態予測誤差に基づく方法」ではモデルベースを選択する確率が増加する傾向がみられた。FetchReachよりも複雑なFetchSlideでは「価値関数に基づく方法」と「状態予測誤差に基づく方法」の両方でモデルフリーを選択する確率が支配的となった。FetchPickAndPlaceでは、「価値関数に基づく方法」は学習中期ではモデルベースを、学習後期ではモデルフリーを選ぶ傾向があり、これまでの研究結果を支持する結果が得られた。またすべての実験において、報酬予測誤差に基づく方法」では学習器の選択について、進捗状況に関連した傾向はみられなかった。

This year, we will focus on strengthening learning, coordinating learning, and benchmarking. This applies to the size and accuracy of each learner's value, and to the selection of the learner's value and method. This year, we will focus on the following topics: "Basic Methods for Compensation Prediction Error,""Basic Methods for Status Prediction Error,""Learning Basic Methods for Performance Evaluation," and "Comparative Studies." Reinforcement Learning Deep Deterministic Policy Gradient, Reinforcement Learning Stochastic Value Gradient, Learning Basics Method, REINFORCE FetchReach, FeatchSlide, FetchPickAndPlace are available in OpenAI Gym. The FetchReach topic has a tendency to increase the accuracy of the selection of the learning process, and to increase the accuracy of the selection of the state prediction error. FetchReach and FetchSlide are the two methods that determine the accuracy of the selection. FetchPickAndPlace supports the results of the study by comparing the results of the study with the results of the study in the middle and late stages of learning. The selection of the learner and the correlation between the progress of the learner and the error of the compensation prediction are based on the following methods:

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

深層並列強化学習

深度并行强化学习

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
五月女絢音;金井理;伊達宏昭;遠藤維;川西康友;大森敏明;内部英治
通讯作者：
内部英治

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

内部英治其他文献

Cooperative behavior acquisition by learning and evolution in a multi-agent environment for mobile robots

移动机器人多智能体环境中通过学习和进化获得合作行为

DOI：
10.11501/3155374
发表时间：
1999
期刊：
影响因子：
0
作者：
内部英治
通讯作者：
内部英治

内部英治的其他文献

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

立即体验

{{ truncateString('内部英治', 18)}}的其他基金

遅延を考慮した非同期分散型マルチモジュール・タイムスケール深層強化学習の開発

考虑延迟的异步分布式多模块时间尺度深度强化学习的开发

批准号：
23K21710
财政年份：
2024
资助金额：
$ 10.73万
项目类别：
Grant-in-Aid for Scientific Research (B)

会员权益说明：