权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

隐马尔可夫环境下强化学习记忆结构的自我控制

基本信息

批准号：
11650441
负责人：
ABE Kenichi
金额：
$ 2.24万
依托单位：
Tohoku University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1999
资助国家：
日本
起止时间：
1999 至 2000
项目状态：
已结题

项目摘要

Recent research on reinforcement learning (RL) algorithms has concentrated on partially observable Markov decision problems (POMDPs). A possible solution to POMDPs is to use history information to estimate state. Q values must be updated in the form reflecting past history of observation/action pairs. In this study, we developed two methods of reinforcement learning, which can solve certain types of POMDPs. The results are summarized as follows :(1) As a result of last Grant-in-Aid for Scientific Research (C)(2), we proposed Labeling Q-learning (LQ-learning), which has a new memory architecture of handling past history. In this study, we established a general framework of the LQ-learning. Various algorithms in this framework were devised, and we gave comparative study between these through simulation. The above LQ-learning, however, has the drawback that we must predefine the labeling mechanism. To overcome this drawback, we further devised a SOM (self-organizing feature map) approach of labeling, in which past history of observation/action pairs are partitioned into classes. The SOM has one-dimensional structure and the output nodes of the SOM produce labels.(2) We proposed a new type of hierarchical RL, called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. SQ-learning uses a hierarchical system of learning automata for switching module. The simulation results demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.It is a future work to build a unified view by which LQ-learning and SQ-learning can be dealt with systematically.

强化学习（RL）算法的最新研究集中在部分可观测马尔可夫决策问题（POMDPs）。POMDP的一个可能的解决方案是使用历史信息来估计状态。Q值必须以反映观察/行动对过去历史的形式更新。在这项研究中，我们开发了两种强化学习方法，可以解决某些类型的POMDPs。（1）作为上一个科学研究资助项目（C）（2）的结果，我们提出了标记Q学习（LQ-learning），它具有处理过去历史的新的记忆结构。在本研究中，我们建立了一个LQ学习的一般框架。在此框架下设计了各种算法，并通过仿真对这些算法进行了比较研究。然而，上面的LQ学习有一个缺点，那就是我们必须对标记机制进行改进。为了克服这个缺点，我们进一步设计了一个SOM（自组织特征映射）的标记方法，在过去的历史观察/动作对划分为类。SOM具有一维结构，SOM的输出节点产生标签。(2)我们提出了一种新型的分层RL，称为开关Q学习（SQ学习）。SQ-learning的基本思想是，非马尔科夫任务可以自动分解为可通过无记忆策略解决的子任务，而无需任何其他信息即可实现“好”的子目标。为了处理这种分解，SQ学习采用Q模块的有序序列，其中每个模块发现局部控制策略。SQ-learning使用一个分层的学习自动机系统作为切换模块。仿真结果表明，SQ-learning具有快速学习最优或接近最优策略的能力，且计算量小，因此建立统一的观点来系统地处理LQ-learning和SQ-learning是未来的工作。

项目成果

期刊论文数量（45）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

Hae Yeon Lee：“Labeling Q-learning for Maze Problems with Partially Observable States”，第 15 届韩国自动控制会议论文集。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

Haeyon Lee: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc.of Fifth Int.Symp.on Artificial Life and Robtics. 484-490 (2000)

Haeyon Lee：“Labeling Q-Learning for Partially Observable Markov Decision Process Environmentals”Proc.of Fifth Int.Symp.on Artificial Life and Robtics。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

HaeYeon Lee: "Labeling Q-Learning For Non-Markovian Environments"1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

HaeYeon Lee：“为非马尔可夫环境标记 Q 学习”1999 年 IEEE 国际 SMC 会议。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

HaeYeon Lee: "Labeling Q-learning for partially observable markov decision process environments"AROB 5th '00. Vol.2. 281-284 (2000)

HaeYeon Lee：“为部分可观察的马尔可夫决策过程环境标记 Q 学习”AROB 5th 00。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

Ikuo Yoshihara: "Extending prediction term of GP-based time series model"AROB 5th '00. Vol.1. 268-271 (2000)

Ikuo Yoshihara：“扩展基于 GP 的时间序列模型的预测项”AROB 5th 00。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

ABE Kenichi其他文献

視覚のジオポリティクス : メディアウォールを突き崩す

视野地缘政治：打破媒体墙

DOI：
发表时间：
2005
期刊：
影响因子：
0
作者：
西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;中山智香子;安村直己;林みどり;大川正彦;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NAKAYAMA Chikako;YASUMURA Naoki;HAYASHI Midori;OKAWA Masahiko;安村直己;林みどり;林みどり;阿部賢一;YASUMURA Naoki;HAYASHI Midori;林みどり;安村直己;阿部賢一;ABE Kenichi;西谷修・中山智香子(編集)
通讯作者：
西谷修・中山智香子(編集)

鎮圧の後で

镇压后

DOI：
发表时间：
2004
期刊：
情況 5巻9号
影响因子：
0
作者：
NISHITANI Osamu;NAKAYAMA Chikako (as editors);田島達也;川村邦光;田島達也;NAKAYAMA Chikako;荻野美穂;成澤勝嗣;NAKAYAMA Chikako;NAKAYAMA Chikako;島薗進;五十嵐公一;HAYASHI Midori;YONETANI Masafumi;杉原達;五十嵐公一;YONETANI Masafumi;野口剛;中村生雄;井田太郎;YONETANI Masafumi;赤坂憲雄;大久保純一;ABE Kenichi;Junichi Okubo;池上良正;ABE Kenichi;島薗進;並木誠士;ABE Kenichi;Seishi Namiki;島薗進;SAKAI Takashi;玉蟲敏子;SAKAI Takashi;玉蟲敏子;冨山一郎;Satoko Tamamushi;SAKAI Takashi;冨山一郎
通讯作者：
冨山一郎

理性の探求(5)名づけと所有--アメリカという制度空間

理性探寻（五）命名与所有权--美国的制度空间

DOI：
发表时间：
2005
期刊：
UP 5月号
影响因子：
0
作者：
NISHITANI Osamu;NAKAYAMA Chikako (as editors);田島達也;川村邦光;田島達也;NAKAYAMA Chikako;荻野美穂;成澤勝嗣;NAKAYAMA Chikako;NAKAYAMA Chikako;島薗進;五十嵐公一;HAYASHI Midori;YONETANI Masafumi;杉原達;五十嵐公一;YONETANI Masafumi;野口剛;中村生雄;井田太郎;YONETANI Masafumi;赤坂憲雄;大久保純一;ABE Kenichi;Junichi Okubo;池上良正;ABE Kenichi;島薗進;並木誠士;ABE Kenichi;Seishi Namiki;島薗進;SAKAI Takashi;玉蟲敏子;SAKAI Takashi;玉蟲敏子;冨山一郎;Satoko Tamamushi;SAKAI Takashi;冨山一郎;西谷修;Satoko Tamamushi;玉蟲敏子;中村生雄;西谷修
通讯作者：
西谷修

A Tikopia in the Global Era : Using Mediation to Empower Coffee Growing Communities in East Timor

全球时代的提科皮亚：利用调解为东帝汶咖啡种植社区赋权

DOI：
发表时间：
2009
期刊：
影响因子：
0
作者：
Tarsitani;Belle Asante;ABE Kenichi
通讯作者：
ABE Kenichi

暴力の哲学

暴力哲学

DOI：
发表时间：
2004
期刊：
影响因子：
0
作者：
西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;中山智香子;安村直己;林みどり;大川正彦;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NAKAYAMA Chikako;YASUMURA Naoki;HAYASHI Midori;OKAWA Masahiko;安村直己;林みどり;林みどり;阿部賢一;YASUMURA Naoki;HAYASHI Midori;林みどり;安村直己;阿部賢一;ABE Kenichi;西谷修・中山智香子(編集);西谷修・中山智香子(共編著);NISHITANI Osamu;大川正彦;酒井隆史
通讯作者：
酒井隆史