权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reinforcement Learning in Large Complex Partially Observable Environments

大型复杂部分可观察环境中的强化学习

基本信息

批准号：
1749045
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2016
资助国家：
英国
起止时间：
2016 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-1749045
关键词：
Reinforcement Learning Large Complex Partially

项目摘要

This project falls into the EPSRC Research Area: Artificial Intelligence Technologies EPSRC Research Theme: Information and Communication TechnologiesThis research project is an exploration of the use of Reinforcement Learning for achieving a sophisticated level of control in large partially observable environments which exhibit complex dynamics and long-term dependencies. Reinforcement Learning (RL) is a branch of Machine Learning that deals with how to act in an environment in order to maximise some notion of cumulative reward. In order to accomplish this, RL agents must carefully balance their exploration and exploitation of said environment, which is a difficult task in large complex environments. In recent years there has been much progress made on applying model-free approaches to such environments with much success. Most notably, approaches involving Deep Q Networks have been able to play a range of Atari games with superhuman performance.We wish to continue this line of research and further investigate the use of Deep Q Networks and their many extensions to environments which require long-term planning. Specifically, we aim to produce an agent that can learn how to play a real time strategy game. In order to be able to accomplish such a goal, an agent must be adapt at many complex tasks. In addition to learning the consequences of its actions, an agent must learn to formulate a long-term goal to build towards, and also learn how to react to changes in its environment. Even humans struggle to play Real Time Strategy games without some prior training or guidance, which highlights the complexity of the problem. It is our belief that pursuing a complex problem such as this would lead to the development of useful ideas and techniques that would be applicable in a multitude of other areas. In order to tackle this problem we will make use of ideas from Hierarchical Reinforcement Learning. We strongly believe that decomposing a problem into simpler sub-problems is a crucial part of being able to tackle complex environments, since the larger problem is often intractable whereas the simpler sub-problems are significantly easier to solve. In addition we will make use of recent advances in Machine Learning, specifically Deep Learning, in order to further refine our internal representation of the environment. An accurate representation of the environment is crucial in order to be able to intelligently act in partially observable domains, especially in the case of Real Time Strategy games where we must also learn to predict our opponent's behaviour.

该项目福尔斯EPSRC研究领域：人工智能技术EPSRC研究主题：信息和通信技术该研究项目是对使用强化学习在表现出复杂动态和长期依赖性的大型部分可观察环境中实现复杂控制水平的探索。强化学习（RL）是机器学习的一个分支，研究如何在环境中采取行动，以最大化累积奖励的概念。为了实现这一点，RL代理必须仔细平衡它们对所述环境的探索和利用，这在大型复杂环境中是一项艰巨的任务。近年来，在将无模型方法应用于此类环境方面取得了很大进展，并取得了很大成功。最值得注意的是，涉及Deep Q Networks的方法已经能够以超人的性能玩一系列Atari游戏。我们希望继续这条研究路线，并进一步研究Deep Q Networks的使用及其在需要长期规划的环境中的许多扩展。具体来说，我们的目标是产生一个代理，可以学习如何发挥真实的时间战略游戏。为了能够完成这样的目标，智能体必须适应许多复杂的任务。除了学习其行为的后果外，智能体还必须学会制定一个长期目标，并学习如何对环境的变化做出反应。即使是人类，在没有事先训练或指导的情况下，也很难玩真实的时间策略游戏，这凸显了问题的复杂性。我们认为，处理这样一个复杂的问题将导致发展适用于许多其他领域的有用的想法和技术。为了解决这个问题，我们将利用分层强化学习的思想。我们坚信，将问题分解为更简单的子问题是能够解决复杂环境的关键部分，因为较大的问题通常是棘手的，而更简单的子问题更容易解决。此外，我们将利用机器学习的最新进展，特别是深度学习，以进一步完善我们对环境的内部表示。为了能够在部分可观察的领域中智能地采取行动，环境的准确表示是至关重要的，特别是在真实的时间策略游戏的情况下，我们还必须学会预测对手的行为。