权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Compositional Causal Model-based Reinforcement Learning

基于组合因果模型的强化学习

基本信息

批准号：
RGPIN-2020-06904
负责人：
Ba, Jimmy
金额：
$ 2.48万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750019
关键词：
Compositional Causal Model based Reinforcement

项目摘要

One of the most important, unsolved problems of artificial intelligence is to build agents with human-like creativity, curiosity, self-assessment, and commonsense reasoning. Recently, model--free reinforcement learning (MFRL) has shown impressive performance in video game playing and locomotion controls using deep neural networks. Despite their success, MFRL methods are fundamentally limited by their trial--and--error nature, which requires millions of training examples to learn a reliable policy. On the other hand, a model--based reinforcement learning (MBRL) agent is capable of deliberate reasoning to achieve its goal. Unlike model--free agents, the MBRL agent iteratively learns a model of the world and plan its action according to its world model. MBRL has a great appeal because the learned model allows the agent to predict its future and reason about the consequences of its own actions. One of the ultimate goals of reinforcement learning research is to have agents acting in multiple environments and generalize previous learning experience to new situations. The ability to transfer knowledge across tasks is considered a critical aspect of any intelligent agent. The main objectives of the proposed research are to introduce a general model-based reinforcement learning algorithm that brings together three key ideas--compositionality, causality, and intrinsic curiosity--have been separately influential in machine learning over the past several decades. The objectives in this 5-year project are as follows: 1. Establish baselines for comparisons: Train and evaluate state-of-the-art model-based reinforcement learning agents in the latest locomotion control physics simulators. 2. Derive a compositional forward dynamics model, where the internal representations are object-based. 3. Explore, evaluate different types of causal inference methods in the proposed compositional model, including linear independent component analysis, mutual information-based independence tests, variational inference. 4. Develop planning-based algorithms to overcome non-stationary intrinsic rewards in exploration. 5. Answer the hypothesis that causal representations lead to simplified learning on new down-stream tasks, to help end-users in interpreting data, and to generalize to novel test examples. I anticipate that this project will benefit both deep learning and reinforcement learning community in several ways, ranging from the establishment of a new approach to actively infer causal factors, to elucidating new knowledge of exploration algorithms, to providing benchmark and open-source implementations of state-of-the-art MFRL and MBRL agents to maximally facilitate future research in the field of machine learning.

人工智能最重要的未解决问题之一是构建具有人类创造力，好奇心，自我评估和常识推理的智能体。最近，无模型强化学习（MFRL）在使用深度神经网络的视频游戏和运动控制中表现出令人印象深刻的性能。尽管取得了成功，MFRL方法从根本上受到其试错性质的限制，这需要数百万个训练示例来学习可靠的策略。另一方面，基于模型的强化学习（MBRL）代理能够进行深思熟虑的推理以实现其目标。与无模型代理不同，MBRL代理迭代地学习世界模型，并根据其世界模型计划其行动。MBRL具有很大的吸引力，因为学习模型允许代理预测其未来并对自己行为的后果进行推理。强化学习研究的最终目标之一是让智能体在多个环境中发挥作用，并将以前的学习经验推广到新的情况。跨任务传递知识的能力被认为是任何智能代理的关键方面。这项研究的主要目标是引入一种通用的基于模型的强化学习算法，该算法汇集了三个关键思想-组合性，因果关系和内在好奇心-在过去几十年中分别对机器学习产生了影响。本五年计划的目标如下：1.建立比较基准：在最新的运动控制物理模拟器中训练和评估最先进的基于模型的强化学习代理。 2.推导出一个组合前向动力学模型，其中内部表示是基于对象的。 3.探索、评价了组合模型中不同类型的因果推理方法，包括线性独立成分分析、基于互信息的独立性检验、变分推理。 4.开发基于规划的算法，以克服探索中的非平稳内在奖励。5.回答因果表征导致新的下游任务的简化学习，帮助最终用户解释数据，并推广到新的测试示例的假设。我预计这个项目将在几个方面使深度学习和强化学习社区受益，从建立一种新的方法来积极推断因果因素，到阐明探索算法的新知识，再到提供最先进的MFRL和MBRL代理的基准和开源实现，以最大限度地促进机器学习领域的未来研究。