权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Study on Decentralized Learning Algorithms in Non-Markovian Environments

非马尔可夫环境下的分散学习算法研究

基本信息

批准号：
09650451
负责人：
ABE Kenichi
金额：
$ 1.6万
依托单位：
Tohoku University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1997
资助国家：
日本
起止时间：
1997 至 1998
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-09650451/
关键词：
HIDDEN MARKOV PARTIALLY OBSERVABLE MARKOV DECISION PROCESS REINFORCEMENT LEARNING Q-LEARNING LABELING Q-LEARNING LEARING AUTOMATON NEURAL NETWORKS DECENTRALIZED LEARNING 隠れマルコフモデル

项目摘要

The results of this study are summarized as follows :(1) A formal model of non-Markovian problems is the partially observable Markov decision problem (POMDP). The most useful solution to overcome partial observability is to use memory to estimate state. In this study, we proposed a new memory architecture of reinforcement learning algorithms to solve certain type of POMDPs.The agent's task is to discover a path leading from start position to goal in a partially observable maze. The agent is assumed to have life-time separable into "trials". The basic framework of the algorithm, called labeling Q-learning, is described as follows.Let 0 be the set of finite observations. At each step t, when the agent gets an observation o_t epsilon OMICRON from the environment, a label, theta_t is attached to the observation, where theta_t is an element of THETA={0, 1, 2, ・, M -1}, (in the beginning of each trial, the labels for all omicron_t epsilon OMICRON are initialized to 0).Then the pair OMICRON_t=(OMICRON_t*THETA_t) defines a new observation, and the usual reinforcementlearning algorithm TD( lambda) that uses replacing traces is applied to OMICRON=OMICRON*THETA, as if the pair = (omicron_t, theta_t) has the Markov property.(2) The labeling Q-learning was applied to test problems of simple mazes taken from the recent literature. The results demonstrated labeling Q-learning's ability to work well in near-optimal manner.(3) Most problems will have continuous or large discrete observation space. We studied generalization techniques by recurrentneural networks(RNN) and holon networks, which allow compact storage of similar observations. Further, we developed an approximate method of controlling the complexity, i.e., the Lyapunov exponent, of RNNs, and the method was demonstrated by applying it to identification problems of certain nonlinear systems.(4) We made fundamental experiments on sensor-based navigation for a mobile robot.

本研究的结果概括如下：(1)非马尔可夫问题的形式化模型是部分可观察马尔可夫决策问题(POMDP)。克服部分可观察性的最有用的解决方案是使用内存来估计状态。在本研究中，我们提出了一种新的强化学习算法记忆架构来解决某些类型的 POMDP。智能体的任务是在部分可观察的迷宫中发现一条从起始位置到目标的路径。假定代理的生命周期可分为“试验”。该算法的基本框架称为标记Q学习，描述如下。设0为有限观测值的集合。在每个步骤 t，当智能体从环境中获得一个观察 o_t epsilon OMICRON 时，一个标签，theta_t 被附加到观察上，其中 theta_t 是 THETA={0, 1, 2, ・, M -1} 的元素，（在每次试验开始时，所有 omicron_t epsilon OMICRON 的标签都初始化为 0）。然后 OMICRON_t=(OMICRON_t*THETA_t) 定义了一个新的观察，并将使用替换迹的常用强化学习算法 TD(lambda) 应用于 OMICRON=OMICRON*THETA，就好像该对 = (omicron_t, theta_t) 具有马尔可夫性质。 (2) 将标记 Q 学习应用于简单迷宫的测试问题从最近的文献来看。结果证明了标记Q学习能够以接近最优的方式很好地工作。(3)大多数问题将具有连续或大的离散观察空间。我们研究了循环神经网络（RNN）和完整网络的泛化技术，它们可以紧凑地存储相似的观察结果。此外，我们还开发了一种控制RNN复杂度（即Lyapunov指数）的近似方法，并通过将该方法应用于某些非线性系统的辨识问题来进行了论证。（4）我们对移动机器人基于传感器的导航进行了基础实验。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35・1. 138-143 (1999)

本间常康：“神经网络动力学复杂性的控制方法”仪器与控制工程师学会学报 35・1（1999）。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33巻11号. 1093-1098 (1997)

Ken Kitakawa：“循环神经网络的紧急学习方法”《仪器与控制工程师协会学报》，第 33 卷，第 11 期。1093-1098（1997 年）。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative" Neural Computation. (in press).

Fation Sevrani：“关于脑状态盒式神经模型的综合及其应用于关联”神经计算。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33・11. 1093-1098 (1997)

北川健：“循环神经网络的紧急学习方法”仪器与控制工程师协会会议记录 33・11（1997）。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35. 138-143 (1999)

Tsuneyasu Honma：“神经网络动力学复杂性的控制方法”《仪器与控制工程师学会汇刊》35. 138-143 (1999)。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

ABE Kenichi其他文献

視覚のジオポリティクス : メディアウォールを突き崩す

视野地缘政治：打破媒体墙

DOI：
发表时间：
2005
期刊：
影响因子：
0
作者：
西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;中山智香子;安村直己;林みどり;大川正彦;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NAKAYAMA Chikako;YASUMURA Naoki;HAYASHI Midori;OKAWA Masahiko;安村直己;林みどり;林みどり;阿部賢一;YASUMURA Naoki;HAYASHI Midori;林みどり;安村直己;阿部賢一;ABE Kenichi;西谷修・中山智香子(編集)
通讯作者：
西谷修・中山智香子(編集)

鎮圧の後で

镇压后

DOI：
发表时间：
2004
期刊：
情況 5巻9号
影响因子：
0
作者：
NISHITANI Osamu;NAKAYAMA Chikako (as editors);田島達也;川村邦光;田島達也;NAKAYAMA Chikako;荻野美穂;成澤勝嗣;NAKAYAMA Chikako;NAKAYAMA Chikako;島薗進;五十嵐公一;HAYASHI Midori;YONETANI Masafumi;杉原達;五十嵐公一;YONETANI Masafumi;野口剛;中村生雄;井田太郎;YONETANI Masafumi;赤坂憲雄;大久保純一;ABE Kenichi;Junichi Okubo;池上良正;ABE Kenichi;島薗進;並木誠士;ABE Kenichi;Seishi Namiki;島薗進;SAKAI Takashi;玉蟲敏子;SAKAI Takashi;玉蟲敏子;冨山一郎;Satoko Tamamushi;SAKAI Takashi;冨山一郎
通讯作者：
冨山一郎

理性の探求(5)名づけと所有--アメリカという制度空間

理性探寻（五）命名与所有权--美国的制度空间

DOI：
发表时间：
2005
期刊：
UP 5月号
影响因子：
0
作者：
NISHITANI Osamu;NAKAYAMA Chikako (as editors);田島達也;川村邦光;田島達也;NAKAYAMA Chikako;荻野美穂;成澤勝嗣;NAKAYAMA Chikako;NAKAYAMA Chikako;島薗進;五十嵐公一;HAYASHI Midori;YONETANI Masafumi;杉原達;五十嵐公一;YONETANI Masafumi;野口剛;中村生雄;井田太郎;YONETANI Masafumi;赤坂憲雄;大久保純一;ABE Kenichi;Junichi Okubo;池上良正;ABE Kenichi;島薗進;並木誠士;ABE Kenichi;Seishi Namiki;島薗進;SAKAI Takashi;玉蟲敏子;SAKAI Takashi;玉蟲敏子;冨山一郎;Satoko Tamamushi;SAKAI Takashi;冨山一郎;西谷修;Satoko Tamamushi;玉蟲敏子;中村生雄;西谷修
通讯作者：
西谷修

A Tikopia in the Global Era : Using Mediation to Empower Coffee Growing Communities in East Timor

全球时代的提科皮亚：利用调解为东帝汶咖啡种植社区赋权

DOI：
发表时间：
2009
期刊：
影响因子：
0
作者：
Tarsitani;Belle Asante;ABE Kenichi
通讯作者：
ABE Kenichi

暴力の哲学

暴力哲学

DOI：
发表时间：
2004
期刊：
影响因子：
0
作者：
西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;西谷修;中山智香子;安村直己;林みどり;大川正彦;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NISHITANI Osamu;NAKAYAMA Chikako;YASUMURA Naoki;HAYASHI Midori;OKAWA Masahiko;安村直己;林みどり;林みどり;阿部賢一;YASUMURA Naoki;HAYASHI Midori;林みどり;安村直己;阿部賢一;ABE Kenichi;西谷修・中山智香子(編集);西谷修・中山智香子(共編著);NISHITANI Osamu;大川正彦;酒井隆史
通讯作者：
酒井隆史