权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reinforcement Learning on the Edge: Specialised Control Policies with Limited Resources

边缘强化学习：资源有限的专门控制策略

基本信息

批准号：
2646067
负责人：
金额：
--
依托单位：
University of Cambridge
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2646067
关键词：
Reinforcement Learning Edge Specialised Control

项目摘要

In recent years, there has been increasing interest in real-world applications of Reinforcement Learning (RL). One prominent domain is that of mobile robots. Robots include sensors and actuators to observe and act in environments, respectively. Even when such devices are controlled according to a policy learned through RL, they typically must rely on a centralised server to determine their next actions. This significantly increases the robot's latency and makes RL an infeasible solution for real-world deployment. Furthermore, robots may encounter situations which differ from those on which they were trained. This project aims to solve both of these problems. Firstly, it seeks to implement efficient on-device inference for RL models. Secondly, it will investigate new methods of lifelong RL which can dynamically learn from new experiences. Through these lines of investigation, this project seeks to introduce a new paradigm of RL which can leverage the experiences and processing power of multiple simultaneous resource-constrained learners. These learners will be tasked both with evaluating RL policies and with integrating new observations into their models in a resource-efficient way. Learners should be able to share this information with other devices periodically, similar to existing distributed RL approaches. New learning approaches and RL algorithms will need to be developed for this goalto be realised. It combines work from systems and machine learning in a novel context.

近年来，强化学习在现实世界中的应用越来越受到人们的关注。一个突出的领域是移动机器人。机器人包括传感器和执行器，分别在环境中观察和行动。即使根据通过RL学习的策略来控制这样的设备，它们通常也必须依赖中央服务器来确定它们的下一步操作。这大大增加了机器人的延迟，并使RL成为现实世界部署的不可行解决方案。此外，机器人可能会遇到与它们训练时不同的情况。该项目旨在解决这两个问题。首先，它寻求为RL模型实现高效的设备上推理。其次，将探索能够从新经验中动态学习的终身学习的新方法。通过这些研究路线，本项目试图引入一种新的RL范式，它可以利用多个同时受到资源限制的学习者的经验和处理能力。这些学习者的任务是评估RL政策，并以资源高效的方式将新的观察结果整合到他们的模型中。学习者应该能够定期与其他设备共享这些信息，类似于现有的分布式RL方法。为了实现这一目标，需要开发新的学习方法和RL算法。它在一个新的背景下结合了系统和机器学习的工作。