权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Memory-based learning of effective actions

EAGER：基于记忆的有效行动学习

基本信息

批准号：
1252987
负责人：
Benjamin Kuipers
金额：
$ 8万
依托单位：
Regents of the University of Michigan - Ann Arbor
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-09-01 至 2014-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1252987&HistoricalAwards=false
关键词：
EAGER Memory based learning effective

项目摘要

This project addresses the foundational question in Robust Intelligence of how an autonomous agent can learn use low-level sub-symbolic (pixel-level) sensorimotor experiences with its environment to learn higher level effective concepts, ranging from learning to use a hand to manipulate objects on a tabletop, to learning to balance and walk, to learning to move through a complex environment without collisions with walls or pedestrians. This project will develop computational models of how this learning process could take place and will implement and test these computational models on an actual robot. Understanding such autonomous concept learning has the potential to impact a range of disciplines including Cognitive Science, Psychology, AI in general, and robotics, computer vision, and machine learning in particular. Understanding how concepts come into being and evolve in the specific domain of robot navigation also has the potential to contribute to advances in systems that help persons with physical and learning disabilities.The project draws on insights from two different approaches from the PI's lab that have complementary strengths: (1) QLAP (Qualitative Learner of Action and Perception), and (2) MPEPC system (Model Predictive Equilibrium Point Control). The QLAP system exploits a qualitative abstraction of continuous sensor input in order to learn causal contingencies, DBN (Dynamic Belief Network) and MDP models of the causal world, and to build a hierarchy of action models. It uses perception with laser rangefinders and correlation peaks between changes to the motor vector and events in the sense vector -- so-called contingencies -- to discern motor signals that produce resulting perceptual events that may be more than random variation. Reliable episodes can be remembered as cases and used in learning. The MPEPC system factors the continuous navigation problem for a mobile robot into a local unconstrained control and a global optimization process that balances constraints such as progress and collision avoidance. Both methods have a local phase (learning contingencies and local control laws), and a global phase (learning a hierarchy of actions and finding extended routes that balance constraints). These two approaches will be augmented by learning methods from Case-Based Reasoning (CBR) that use features of the presenting case to retrieve related cases from case memory. Two levels of case representation will be employed. The lowest level case representation is a simple feature vector: in the case of local motion control, it specifies the target pose location in the egocentric frame of reference, along with the parameters of the motion control law that attempts to reach it, and the quality of the resulting trajectory. Retrieval will be done using Nearest Neighbor, combining information from the retrieved cases by Locally Weighted Regression or Locally Weighted Projection Regression. At the higher level of action learning, a case is to be described by identifying the critical environmental constraints that determine the global structure of the action.

该项目解决了鲁棒智能中的基本问题，即自主代理如何学习使用其环境中的低级次符号（像素级）感觉运动经验来学习更高级别的有效概念，从学习使用手操纵桌面上的物体，到学习平衡和行走，再到学习在复杂的环境中移动而不与墙壁或行人碰撞。该项目将开发学习过程的计算模型，并将在实际机器人上实施和测试这些计算模型。理解这种自主概念学习有可能影响一系列学科，包括认知科学、心理学、人工智能，以及机器人、计算机视觉和机器学习。了解概念是如何在机器人导航的特定领域产生和发展的，也有可能有助于帮助有身体和学习障碍的人的系统的进步。该项目借鉴了PI实验室两种具有互补优势的不同方法的见解：(1)QLAP（定性行动和感知学习者）和(2)MPEPC系统（模型预测平衡点控制）。QLAP系统利用连续传感器输入的定性抽象来学习因果偶然性、因果世界的DBN （Dynamic Belief Network）和MDP模型，并建立行动模型的层次结构。它利用激光测距仪的感知和运动矢量变化与感觉矢量中的事件之间的相关峰值（所谓的偶然性）来辨别运动信号，这些信号产生的感知事件可能不仅仅是随机变化。可靠的情节可以作为案例被记忆并用于学习。MPEPC系统将移动机器人的连续导航问题分解为局部无约束控制和平衡进度和避碰等约束的全局优化过程。这两种方法都有一个局部阶段（学习偶然性和局部控制律）和一个全局阶段（学习行动层次并找到平衡约束的扩展路线）。这两种方法将通过基于案例推理（CBR）的学习方法得到增强，CBR使用呈现案例的特征从案例记忆中检索相关案例。将采用两种级别的案例表示。最低级别的情况表示是一个简单的特征向量：在局部运动控制的情况下，它指定了自我中心参照系中的目标姿态位置，以及试图达到它的运动控制律的参数，以及由此产生的轨迹的质量。检索将使用最近邻，通过局部加权回归或局部加权投影回归结合检索案例的信息。在更高层次的行动学习中，通过识别决定行动全局结构的关键环境约束来描述一个案例。