权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Towards Open-ended Reinforcement Learning using Synthetic Environment Generation

使用合成环境生成实现开放式强化学习

基本信息

批准号：
2711309
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2711309
关键词：
Towards Open ended Reinforcement Learning

项目摘要

Sequential decision making problems are ubiquitous in engineering and science. Reinforcement Learning (RL), anAI paradigm where agents learn decision-making skills via trial-and-error interactions with their environment, hasachieved significant success in handling complex decision tasks. However, these agents often struggle to generalize,exhibiting suboptimal performance on previously unseen tasks. Furthermore, once deployed, an AI that underperforms in a new task often lacks opportunities for improvement, as its learning process ceases after the initialtraining phase. These challenges restrict the practical use of such systems in real-worlds settings, such as sim2real.Open-ended Learning (OEL) seeks to overcome these limitations with the goal of producing learning systems thatare robust to situations not explicitly considered during design and training.Aims and ObjectivesWhile there are many possible directions towards achieving OEL agents, this proposal specifically focuses on thedevelopment of automated curricula methods through novel techniques for synthetic environment generation. Assuch, the goals of this proposal are decomposed as follows:Develop new methods for generating synthetic tasks/environments.Use synthetic environment generation to develop novel Unsupervised Environment Design (UED) methods forautomatic environment curricula generation.Show that applying such curricula to agent training produces agents with strong out-of-distribution generalization.Novelty of the Research MethodologyThe proposed research primarily looks to extend work in the developing subfield of Unsupervised EnvironmentDesign (UED), which seeks to generate environments tailored to the current learning agent to facilitate continuedlearning. However, UED is currently limited to generating levels, which are configurations of a specific task e.g.,the layout of a maze in a navigation task. We seek to improve state-of-the-art UED methods to generate entireenvironments, not just levels, which are novel tasks for a more general agent to train on and solve. Doing so willrequire more sophisticated generative AI techniques, where we look to leverage recent advances such as diffusionmethods. Ultimately, this work will operate on the intersection of multiple AI subfields, at a high level namely RLand generative AI.Alignment to EPSRC's strategies and research areasThe proposed work falls under the EPSRC's Artificial intelligence technologies remit. It aligns with EPSRC's goals ina number of ways, including developing new AI techniques that are deployable in real world situations. We will alsowork on the intersection of multiple AI subfields, in line with EPSRC's goal of supporting interdisciplinary researchmethods.

顺序决策问题在工程和科学中普遍存在。强化学习（RL）是一种人工智能范式，智能体通过与环境的试错互动来学习决策技能，在处理复杂的决策任务方面取得了重大成功。然而，这些智能体往往难以泛化，在以前看不见的任务上表现出次优的性能。此外，一旦部署，在新任务中表现不佳的人工智能通常缺乏改进的机会，因为它的学习过程在初始训练阶段之后就停止了。这些挑战限制了此类系统在现实环境（如sim2real）中的实际使用。开放式学习（OEL）旨在克服这些限制，其目标是产生对设计和培训期间未明确考虑的情况具有鲁棒性的学习系统。虽然实现OEL代理有许多可能的方向，但本提案特别侧重于通过合成环境生成的新技术开发自动化课程方法。因此，本提案的目标分解如下：开发生成合成任务/环境的新方法。采用合成环境生成方法，开发了新的无监督环境设计方法，实现了环境课程的自动生成。结果表明，将此类课程应用于智能体训练，可以产生具有较强分布外泛化的智能体。研究方法的新颖性提议的研究主要着眼于扩展无监督环境设计（UED）的发展子领域的工作，该领域旨在生成适合当前学习代理的环境，以促进持续学习。然而，UED目前仅限于生成关卡，即特定任务的配置，例如导航任务中的迷宫布局。我们寻求改进最先进的UED方法来生成整个环境，而不仅仅是关卡，这是一个更一般的智能体需要训练和解决的新任务。这样做将需要更复杂的生成人工智能技术，我们希望利用扩散方法等最新进展。最终，这项工作将在多个人工智能子领域的交叉点上运行，在一个高层次上，即RLand生成人工智能。与EPSRC的战略和研究领域保持一致拟议的工作属于EPSRC的人工智能技术职权范围。它在许多方面与EPSRC的目标保持一致，包括开发可在现实世界中部署的新人工智能技术。我们还将在多个人工智能子领域的交叉领域开展工作，这符合EPSRC支持跨学科研究方法的目标。