权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reinforcement Learning Algorithms Designed to Persist

旨在持久的强化学习算法

基本信息

批准号：
RGPIN-2022-04035
负责人：
Bellemare, Marc
金额：
$ 2.04万
依托单位：
McGill University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750337
关键词：
Reinforcement Learning Algorithms Designed Persist

项目摘要

The field of reinforcement learning is concerned with understanding how intelligent agents can, from trial and error, learn to make decisions that lead to the best outcomes. In silico, its techniques have been applied to wide range of domains, producing computer programs that surpass the world's human champions at the game of Go (2016), can autonomously navigate balloons in the stratosphere (2020), and can design electronics in a fraction of the time taken by human experts (2021). To achieve this level of performance, however, these programs require weeks or even months or training. This is because the most effective reinforcement learning methods are designed to learn to solve a given task from scratch. Using present algorithms it is difficult, if not downright impossible to carry over the learnings from one version of the program to the next. This makes it hard, for example, to support a learning system that evolves and learns over a period of years - a common scenario in practical applications, where a research and development team might continue to improve the learning software over time. The research in this proposal aims to address this shortcoming by studying methods and principles with which previously-acquired experience may be carried across iterations of a learning system. Doing so requires understanding how an agent's immediate experience can be synthesized into a more permanent form called a representation of state, and also how an agent can purposefully act to acquire new information that helps it gains a better understanding of its environment. Fundamental advances in this direction will make it possible to design learning systems that benefit from years, if not decades of experience and can therefore make substantially better decisions.

强化学习领域关注的是理解智能代理如何从试错中学习做出导致最佳结果的决策。在计算机中，其技术已应用于广泛的领域，产生的计算机程序在围棋比赛中超过了世界人类冠军（2016年），可以在平流层中自主导航气球（2020年），并且可以在人类专家所需时间的一小部分内设计电子产品（2021年）。然而，要达到这种水平的性能，这些程序需要数周甚至数月的培训。这是因为最有效的强化学习方法旨在从头开始学习解决给定的任务。使用目前的算法是困难的，如果不是完全不可能的，从一个版本的程序进行学习到下一个。这使得它很难，例如，支持一个学习系统，演变和学习在一段时间内-在实际应用中，一个研究和开发团队可能会继续改进学习软件随着时间的推移，一个常见的场景。本提案中的研究旨在通过研究方法和原则来解决这一缺点，这些方法和原则可以在学习系统的迭代中携带先前获得的经验。要做到这一点，就需要了解智能体的即时体验如何被合成为一种更持久的形式，称为状态表征，以及智能体如何有目的地采取行动来获取新信息，以帮助它更好地了解其环境。这方面的根本性进展将使设计学习系统成为可能，这些系统将受益于多年甚至数十年的经验，因此可以做出更好的决策。