权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Task-general reinforcement learning algorithms

任务通用强化学习算法

基本信息

批准号：
2426703
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2426703
关键词：
Task general reinforcement learning algorithms

项目摘要

This project falls within the EPSRC Information and communication technologies (ICT) research area. The goal of the project is to develop algorithms capable of extracting task-general structure from data and using that structure to efficiently learn on novel tasks. These algorithms would enable deployment of reinforcement learning agents in the real world, where they could acquire new skills based on little experience. This data efficient skill acquisition could lead to more economical automation of industrial processes and household routines. While task-generalization in reinforcement learning is not a new research topic, effectively tackling it requires zooming out from the single-task perspective that has been prevalent in the field recently. The first steps away from that perspective are taken by considering generalization performance to novel tasks a key measure of success. This is adopted as a central evaluation criterion for the algorithms developed in this project. The evaluation will consist of gathering both empirical and theoretical support for the algorithms. The outcomes of the development and evaluation are published in the top conferences and academic journals of the artificial intelligence and machine learning communities.One way to improve task-generalization in reinforcement learning is to consider a higher-level learning problem, called meta-learning, which aims to learn the learning algorithm itself with the explicit objective of fast learning on novel tasks. Meta-learning is a promising tool for task-generalization since it enables leveraging the strength of deep learning when abundant data is available by turning the problem of generalization also into learning. While meta-learning has seen a surge of interest and many exciting contributions in the past few years, the generalization performance of these approaches to genuinely novel tasks outside the training task distributions has garnered only limited attention. This lack of attention serves as a signpost guiding this research project into the relatively unexplored territory of tackling the questions of task-generalization explicitly.Concretely in this project, new algorithms and training environments are developed for task-general reinforcement learning. To develop new algorithms, novel meta-parameterizations of reinforcement learning agents and the algorithms themselves will be considered. The generalization performance of reinforcement learning agents does not only rely on the training algorithm but on the training environments and datasets as well. Therefore, new generalization-focused training environments will have to be developed.

该项目福尔斯属于EPSRC信息和通信技术（ICT）研究领域。该项目的目标是开发能够从数据中提取任务通用结构的算法，并使用该结构有效地学习新任务。这些算法将使强化学习代理能够在真实的世界中部署，在那里他们可以根据很少的经验获得新的技能。这种数据高效的技能获取可能会导致工业流程和家庭日常工作更经济的自动化。虽然强化学习中的任务泛化并不是一个新的研究课题，但有效地解决它需要从最近在该领域流行的单任务角度进行缩小。摆脱这一观点的第一步是将新任务的概括性能视为成功的关键衡量标准。这是通过作为本项目中开发的算法的中央评价标准。评估将包括收集算法的经验和理论支持。开发和评估的结果发表在人工智能和机器学习社区的顶级会议和学术期刊上。在强化学习中提高任务泛化能力的一种方法是考虑一个更高级别的学习问题，称为元学习，其目的是学习学习算法本身，明确目标是快速学习新任务。元学习是一种很有前途的任务泛化工具，因为它能够在大量数据可用时利用深度学习的优势，将泛化问题也转化为学习。虽然元学习在过去几年中引起了人们的兴趣和许多令人兴奋的贡献，但这些方法对训练任务分布之外真正新颖任务的泛化性能只受到了有限的关注。这种关注的缺乏是一个路标，引导本研究项目进入相对未开发的领域，明确解决任务泛化的问题。具体来说，在本项目中，为任务一般的强化学习开发了新的算法和训练环境。为了开发新的算法，将考虑强化学习代理和算法本身的新元参数化。强化学习代理的泛化性能不仅依赖于训练算法，还依赖于训练环境和数据集。因此，必须开发新的以泛化为重点的培训环境。