权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Model-based reinforcement learning : brain implementation and engineering applications

基于模型的强化学习：大脑实现和工程应用

基本信息

批准号：
15300102
负责人：
ISHII Shin
金额：
$ 7.68万
依托单位：
Nara Institute of Science and Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2003
资助国家：
日本
起止时间：
2003 至 2005
项目状态：
已结题

项目摘要

[On-line Bayesian learning schemes]We devised an on-line Bayesian learning algorithm which can be applied to Gaussian stochastic processes and can estimate the system dimensionality and change occurrence in the target dynamics (Hirayama et al., 2004). We also devised a sequential Monte-Carlo-based method which can be applied to non-Gaussian stochastic processes and applied it to visual tracking problems (Bando, et al., in press).[Applications of model-based reinforcement learning and on-line learning]We succeeded in allowing a biped robot simulator to biped-walk autonomously, based on the combination of central pattern generator and reinforcement learning. We later extended this approach such to incorporate policy-gradient-based reinforcement learning. By further introducing an on-line model identification method, the autonomous learning by the biped simulator has been accelerated (Nakamura et al., 2005). Our reinforcement learning for a switching controller succeeded in swinging-up an … More d stabilizing an underactuated real robot, the acrobot. An autonomous training scheme based on the combination of the model-based reinforcement learning and the on-line model learning can construct a card-game playing agent for a multi-agent card game, which is as strong as a human expert player (Ishii, et al., 2005).[Reward-related prefrontal neural activities of primates]An electrophysiological study with a primates memory-based sensorimotor processing task revealed that the reward expectation significantly enhanced the selectivity of sensory working memory but not that of motor memory (Amemori, et al., 2005).[Neuropsychological study of humans prefrontal information processing]We developed an information processing model during a human performs a Markov decision process, and evaluated the model plausibility by means of neuropsychological studies with functional magnetic resonance imaging. We found the engagement of dorsolateral prefrontal cortex (Yoshida, et al., 2005). When the Markov decision environment involves uncertainty, its resolution could be performed in front-polar prefrontal cortex (Yoshida, et al., in press). Less

[在线贝叶斯学习方案]我们设计了一种在线贝叶斯学习算法，该算法可以应用于高斯随机过程，并且可以估计系统维度和目标动态中的变化发生（Hirayama et al.，2004年）。我们还设计了一种基于顺序蒙特-卡罗的方法，该方法可以应用于非高斯随机过程，并将其应用于视觉跟踪问题（Bando等人，印刷中）。[基于模型的强化学习和在线学习的应用]基于中央模式发生器和强化学习的组合，我们成功地让一个机器人模拟器自主地两足行走。我们后来扩展了这种方法，以纳入基于策略梯度的强化学习。通过进一步引入在线模型识别方法，已经加速了由所述仿真器进行的自主学习（中村等人，2005年）。我们的开关控制器的强化学习成功地在摆动了一个 ...更多信息 d稳定欠驱动的真实的机器人，杂技演员。基于基于模型的强化学习和在线模型学习的组合的自主训练方案可以为多智能体纸牌游戏构建纸牌游戏智能体，其与人类专家玩家一样强（石井等人，2005年）。[灵长类动物的奖赏相关前额叶神经活动]一项对灵长类动物基于记忆的感觉运动加工任务的电生理研究显示，奖赏期望显著增强了感觉工作记忆的选择性，但对运动记忆的选择性没有影响（Amemori，et al.，2005年）。[人类前额叶信息加工的神经心理学研究]我们建立了一个人类执行马尔可夫决策过程的信息加工模型，并通过功能磁共振成像的神经心理学研究来评估模型的可操作性。我们发现背外侧前额叶皮层的参与（Yoshida等人，2005年）。当马尔可夫决策环境涉及不确定性时，其解决方案可以在前极前额叶皮层中执行（Yoshida等人，印刷中）。少

项目成果

期刊论文数量（96）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

DOI：
10.20965/jrm.2005.p0636
发表时间：
2005-12
期刊：
J. Robotics Mechatronics
影响因子：
0
作者：
Yutaka Nakamura;Takeshi Mori;Yoichi Tokita;T. Shibata;S. Ishii
通讯作者：
Yutaka Nakamura;Takeshi Mori;Yoichi Tokita;T. Shibata;S. Ishii

A model of smooth pursuit in primates based on learning the target dynamics

DOI：
10.1016/j.neunet.2005.01.001
发表时间：
2005-04
期刊：
Neural networks : the official journal of the International Neural Network Society
影响因子：
0
作者：
T. Shibata;H. Tabata;S. Schaal;M. Kawato
通讯作者：
T. Shibata;H. Tabata;S. Schaal;M. Kawato

Aceobot control by learning the switching of multiple controllers

通过学习多个控制器的切换进行Aceobot控制