权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Open-Ended Discovery of Skill Hierarchies in Artificial Intelligence

人工智能技能层次结构的开放式发现

基本信息

批准号：
2278914
负责人：
金额：
--
依托单位：
University of Bath
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2278914
关键词：
Open Ended Discovery Skill Hierarchies

项目摘要

People solve complex tasks every day by decomposing them into smaller sub-tasks. For instance, the task of making a cup of tea can be decomposed into the sub-tasks of boiling the kettle, adding sugar, adding a tea-bag, grasping the cup, and so on. These sub-problems can themselves be decomposed into even smaller sub-problems, all the way down to the individual muscle movements involved - forming a hierarchy of skills useful for solving the problem.Of course, learning how to make a cup of tea at the scale of muscle movements would be an unreasonably large computational undertaking - much of our problem-solving ability is attributed to our ability to discover and plan using hierarchically organised higher-level behaviours. Planning and learning a sequence of a few high-level behaviours is clearly less computationally expensive than planning and learning a sequence of perhaps millions of primitive actions.Two key open research questions are how such useful skills should be characterised, and how an artificially intelligent agent should go about autonomously discovering them. It is these questions that we hope, at least in part, to address during this research project.We frame this research within the well-developed framework of Reinforcement Learning (RL), which concerns itself broadly with how artificially intelligent agents should learn optimal behavioural policies through interaction with their environments. Many RL methods, even those considered state-of-the-art, operate using primitive actions - they are still making cups of tea operating on the scale of muscle movements, as it were. The branch of RL which considers higher-level behaviours taken over varying timesteps is known as Hierarchical Reinforcement Learning (HRL), in reference to how skills can be organised hierarchically.Explicitly, the main objective of this research project is to develop an HRL algorithm, or set of HRL algorithms, which endow artificially intelligent agents with the ability to discover a hierarchy of useful high-level behaviours through interaction with their environment.There are several desirable properties that the algorithm(s) developed over the course of this project should possess. Firstly, the algorithms should be developmental, with higher-level, more complex skills being constructed hierarchically from lower-level ones as time goes on. Secondly, the algorithms should be domain-independent to ensure their applicability to many types of problem. Thirdly, it would be a desirable outcome if the algorithms developed performed well in tasks which are currently considered difficult (e.g. "hard exploration" problems such as the game of Montezuma's Revenge).These desirable properties stem partly from various shortcomings in current HRL methods. For instance, many existing HRL methods are applicable only in discrete domains, those with small state-spaces, or otherwise would not scale well to larger domains or those with continuous state-spaces. This limits their applicability to many of the interesting, complex problems that are ultimately of interest to RL.The benefits of developing such algorithms would be wide-ranging - allowing reinforcement learning to be applied to larger, more complex problems to which current method simply do not scale well.

人们每天通过将复杂的任务分解成更小的子任务来解决它们。例如，泡一杯茶的任务可以分解成煮壶、加糖、加茶包、抓杯子等子任务。这些子问题本身可以分解成更小的子问题，一直到所涉及的单个肌肉运动——形成一个有助于解决问题的技能层次。当然，学习如何在肌肉运动的规模上泡一杯茶将是一项不合理的庞大计算任务——我们解决问题的能力很大程度上归功于我们使用分层组织的高级行为来发现和计划的能力。计划和学习一些高级行为的序列显然比计划和学习可能包含数百万个原始动作的序列的计算成本要低。两个关键的开放研究问题是，这些有用的技能应该如何表征，以及人工智能代理应该如何自主地发现它们。正是这些问题，我们希望，至少部分地，在这个研究项目中解决。我们在完善的强化学习（RL）框架内构建了这项研究，该框架广泛关注人工智能代理如何通过与其环境的交互学习最佳行为策略。许多强化学习方法，即使是那些被认为是最先进的方法，都是用原始的动作来操作的——它们仍然是在用肌肉运动的规模来泡茶。强化学习的分支考虑了在不同时间步上采取的高级行为，被称为分层强化学习（HRL），指的是如何分层组织技能。明确地说，本研究项目的主要目标是开发一种HRL算法，或一组HRL算法，使人工智能代理能够通过与其环境的交互发现有用的高级行为层次结构。在本项目过程中开发的算法应该具有几个理想的属性。首先，算法应该是发展性的，随着时间的推移，更高级、更复杂的技能会从低级技能分层次地构建出来。其次，算法应该是领域无关的，以确保其适用于许多类型的问题。第三，如果所开发的算法在目前被认为困难的任务中表现良好(例如：“困难的探索”问题，如《蒙特祖玛的复仇》游戏)。这些理想的性质部分源于当前HRL方法的各种缺点。例如，许多现有的HRL方法仅适用于具有小状态空间的离散域，或者不能很好地扩展到具有连续状态空间的较大域。这限制了它们在许多有趣的、复杂的问题上的适用性，而这些问题最终是RL感兴趣的。开发这种算法的好处将是广泛的——允许强化学习应用于更大、更复杂的问题，而目前的方法根本无法很好地扩展这些问题。