权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Reinforcement Learning Methods for Learning in Real-Time with Robots

用于机器人实时学习的可扩展强化学习方法

基本信息

批准号：
RGPIN-2021-02690
负责人：
Mahmood, Ashique
金额：
$ 1.75万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=742157
关键词：
Scalable Reinforcement Learning Methods Real

项目摘要

Reinforcement learning brings the promise of continually adaptive systems for numerous tasks that humans do well but are physically laborious such as housekeeping, warehouse fulfillment, and delivery services. Such tasks require a common-sense understanding from the agent's part of a dynamically changing physical environment, which is difficult to enumerate and include in a system through hand-engineering. The proposed program aims at developing real-time learning robotic systems that interact with the physical world and adapt in real-time. Some of the most promising approaches in reinforcement learning for robotics are based on learning from human-provided demonstration data and simulators. However, approaches reliant on human interventions are not scalable or sufficient for developing robotics systems that can adapt their performance in real-time under new or changing environments. Our proposed program complements the existing approaches by developing scalable and automatic mechanisms for continually learning robotic systems. All advanced deep reinforcement learning methods for control use expensive learning mechanisms such as those based on experience replay buffers. While such expensive learning mechanisms are more appropriate for training offline or over clouds, we propose a lightweight onboard learning system to adapt and react to changes quickly in real-time. Our proposed onboard learning system will be composed of computationally inexpensive and stable policy and representation learning algorithms. We consider the policy to be only the last semi-linear layer of the network, for which gradient updates can be made more stably without using replay buffers. In addition, the onboard system will perform representation learning only through random perturbation to a small portion of the hidden nodes. We investigate whether such a lightweight learning system in conjunction with a more expensive replay-based learning system performs better than replay-based learning alone. The proposed program also aims at developing efficient and stable policy and representation learning methods. We develop a theoretical framework that enriches our understanding of how to create new and efficient policy learning methods in a directed way. For representation learning, we extend an existing strategy for representation search called generate-and-test to reinforcement learning. We develop a general mechanism of generate-and-test where the utility of features is defined solely based on the loss function, allowing applicability to any loss function and neural architecture. Computationally inexpensive learning mechanisms are essential for making reinforcement learning systems more accessible and applicable to robotics. The lightweight onboard system of the proposed program will allow graduate students, entrepreneurs, and enthusiasts around the world to build continually learning robots more easily, relieving humans from numerous laborious tasks.

强化学习带来了持续自适应系统的希望，用于人类做得很好但体力劳动的许多任务，如家务，仓库履行和送货服务。这样的任务需要一个常识性的理解，从代理的一部分，一个动态变化的物理环境，这是很难枚举，并包括在一个系统中，通过手工工程。该计划旨在开发实时学习机器人系统，与物理世界进行交互并实时适应。机器人强化学习中一些最有前途的方法是基于从人类提供的演示数据和模拟器中学习。然而，依赖于人类干预的方法对于开发可以在新的或变化的环境下实时调整其性能的机器人系统来说是不可扩展的或不充分的。我们提出的计划通过开发可扩展的自动机制来补充现有的方法，以不断学习机器人系统。所有用于控制的高级深度强化学习方法都使用昂贵的学习机制，例如基于经验重放缓冲区的学习机制。虽然这种昂贵的学习机制更适合离线或云端训练，但我们提出了一种轻量级的板载学习系统，以实时快速适应和响应变化。我们提出的板载学习系统将由计算成本低，稳定的政策和表示学习算法。我们认为该策略只是网络的最后一个半线性层，在不使用重放缓冲区的情况下，可以更稳定地进行梯度更新。此外，机载系统将仅通过对一小部分隐藏节点的随机扰动来执行表示学习。我们调查是否这样一个轻量级的学习系统，结合更昂贵的基于重放的学习系统比单独的基于重放的学习更好地执行。该计划还旨在开发高效稳定的策略和表示学习方法。我们开发了一个理论框架，丰富了我们的理解，如何创造新的和有效的政策学习方法的方向。对于表示学习，我们扩展了现有的表示搜索策略，称为生成和测试强化学习。我们开发了一种通用的生成和测试机制，其中功能的效用仅基于损失函数定义，允许适用于任何损失函数和神经架构。计算成本低廉的学习机制对于使强化学习系统更容易获得和适用于机器人至关重要。该计划的轻型机载系统将使世界各地的研究生、企业家和爱好者能够更容易地建造不断学习的机器人，从而将人类从众多繁重的任务中解放出来。