权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Memory-Based Operant Learning

基于记忆的操作学习

基本信息

批准号：
9978403
负责人：
David Touretzky
金额：
$ 33.83万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
1999
资助国家：
美国
起止时间：
1999-12-15 至 2004-11-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9978403&HistoricalAwards=false
关键词：
Memory Based Operant Learning

项目摘要

The PI will develop a cognitively plausible reinforcement learning (RL) architecture as a model of instrumental learning in animals and robots. Although RL was initially inspired by animal learning phenomena, the field has since developed mainly by addressing AI concerns. A major limitation of current RL architectures as cognitive theories is the representation of state space. Models that maintain explicit state representations (such as Q~tables) are limited to simple domains with only a few variables, while models that represent the state space implicitly (e.g., using a neural net function approximator) require large amounts of training data and unreasonably long training times compared to real animals. The PI's approach is to develop specialized representations of state space that are appropriate for modeling animal behavior and can support desired generalizations. The simulated animal's working memory will encode sensory stimuli, state change events, and the animal's own actions. An explicit state representation would encode the conjunction of all these variables, generating a combinatorial explosion. The proposed alternative approach is for the model to form conjunctions of selected variables, allowing it to incrementally expand its state description while focusing on just those dimensions that are relevant to the task being learned. Heuristics based on fast, single-layer neural net learning will be developed to select useful conjunctions as a function of recent experience. The PI also will investigate matching the current state of working memory with records of entire past states, or episodes, in order to predict reward. A flexible architecture will be developed for representing actions in a parameterized manner (so as to provide infinite variability), and with temporal duration (allowing stimuli and rewards to arrive in the midst of execution). Finally, there will be mechanisms for coping with failure of an action to execute successfully or to produce an expected reward; this will provide the basis for modeling phenomena such as effects of partial reinforcement schedules and increased behavioral variability during extinction. If successful, this work will advance the state of the art of reinforcement learning by introducing new techniques for handling complex state and action spaces. This has important implications for theories of animal cognition, for robots that learn by exploration and experimentation, and for robots intended to learn from human teachers.

PI将开发一种认知上合理的强化学习（RL）架构，作为动物和机器人工具学习的模型。尽管RL最初受到动物学习现象的启发，但该领域主要通过解决AI问题而发展。当前RL架构作为认知理论的一个主要局限是状态空间的表示。维护显式状态表示（诸如Q~表）的模型限于仅具有几个变量的简单域，而隐式表示状态空间的模型（例如，使用神经网络函数近似器）需要大量的训练数据和与真实的动物相比不合理的长训练时间。 PI的方法是开发适合于动物行为建模的状态空间的专门表示，并且可以支持所需的概括。模拟动物的工作记忆将对感官刺激、状态变化事件和动物自己的动作进行编码。显式状态表示将对所有这些变量的合取进行编码，从而产生组合爆炸。所提出的替代方法是模型形成选定变量的连接，允许它逐步扩展其状态描述，同时只关注与正在学习的任务相关的维度。将开发基于快速单层神经网络学习的启发式，以根据最近的经验选择有用的合取词。 PI还将研究工作记忆的当前状态与整个过去状态或事件的记录是否匹配，以预测奖励。将开发一个灵活的架构，用于以参数化的方式（以便提供无限的可变性）和时间持续时间（允许刺激和奖励在执行过程中到达）表示动作。最后，将有机制来处理失败的行动，以成功执行或产生预期的奖励;这将提供基础的建模现象，如部分强化时间表的影响和灭绝过程中增加的行为变异性。如果成功，这项工作将通过引入处理复杂状态和动作空间的新技术来推进强化学习的发展。这对动物认知理论、通过探索和实验学习的机器人以及向人类教师学习的机器人都有重要意义。