权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Object-Centric Visual Representation And Reinforcement Learning

以对象为中心的视觉表示和强化学习

基本信息

批准号：
2722103
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2722103
关键词：
Object Centric Visual Representation Reinforcement

项目摘要

Abstract We will develop new object-centric sequence models for vision, with the intention to improve data-efficiencyand robustness to out-of-distribution environments for video prediction and vision-based reinforcement learning.Aims And Objectives The first stage of the proposed research aims to combine temporal predictive coding with object centric learning and apply the resulting model to video prediction. This stage will aim to answer several questions:What requirements should an object-centric video model satisfy? How can these requirements be reflected in the designof OCTPC? How should object properties, instantaneous attribute variables, stochastic temporal evolution of attributevariables, and inter-object relationships be represented? How should object instances be represented differently to objecttypes? How precisely should the temporal predictive coding mechanism be specified such that it is capable of learningsufficiently long-term dependencies? What are the commonalities and differences between existing approaches to object centric learning, and how do these relate to the performance of such algorithms? How does OCTPC perform on a variety ofvideo-prediction benchmarks? Which design choices and hyperparameters have the greatest effect on performance? Howdoes the performance, behaviour, and learning efficiency of OCTPC compare with models which aren't object-centric, orwhich don't use predictive coding?1The second stage of the proposed research aims to explore the possible benefits of OCTPC in reinforcement learning.There are two primary motivations for doing so. Firstly, reinforcement learning effectively depends on being able to predictthe future, because the ultimate objective is to choose a policy which maximises expected long-term future reward. It isplausible that a using a performant video prediction model as a component in model-based reinforcement learning wouldallow an agent to make more accurate predictions of how changes in policy would affect future experience, and thereforehow the policy should be changed in order to maximise future reward. Secondly, there is a neuroscientific principle whichstates that "the processing function of neocortical modules is qualitatively similar in all neocortical regions... there isnothing intrinsically motor about the motor cortex, nor sensory about the sensory cortex." [13]. Therefore, if it is foundthat object-centric inductive biases are useful for video prediction, then it may be the case that similar inductive biasesare useful in policy representations as well. It would be interesting to compare such policy representations to existingwork in hierarchical reinforcement learning [17], and explore whether such representations can improve sample-efficiencyin reinforcement learning, and robustness to out-of-distribution environments.Novelty Of The Research Methodology To our knowledge, combining temporal predictive coding with object-centriclearning has not previously been explored. It is plausible that exploring this combination will provide valuable contributionsand insights to machine learning, while also providing value to the cognitive sciences by moving closer to an understandingof human intelligence on the algorithmic level.Alignment To EPSRC's Strategies And Research Areas This research proposal aligns with the areas of Artificialintelligence and robotics theme, Artificial intelligence technologies, and Image and vision computing.

摘要我们将开发新的以对象为中心的序列模型的视觉，目的是提高数据的efficiencyand鲁棒性的视频预测和基于视觉的reinforcement learning.Aims和Objectives的分布环境的第一阶段的拟议研究的目的是联合收割机的时间预测编码与对象为中心的学习和应用所产生的模型，视频预测。这个阶段的目标是回答几个问题：以对象为中心的视频模型应该满足什么要求？这些要求如何体现在OCTPC的设计中？对象属性、瞬时属性变量、属性变量的随机时间演化以及对象间的关系应该如何表示？对象实例应该如何与对象类型不同地表示？时间预测编码机制应该如何精确地指定，以便它能够学习足够长的依赖关系？现有的以对象为中心的学习方法之间有什么共同点和差异，这些方法与这些算法的性能有什么关系？OCTPC在各种视频预测基准测试中的表现如何？哪些设计选择和超参数对性能影响最大？OCTPC的性能、行为和学习效率与非以对象为中心或不使用预测编码的模型相比如何？1第二阶段的研究旨在探索OCTPC在强化学习中的可能益处。这样做有两个主要动机。首先，强化学习有效地依赖于能够预测未来，因为最终目标是选择一个最大化预期长期未来回报的政策。使用性能视频预测模型作为基于模型的强化学习的组成部分，可以让代理更准确地预测政策的变化将如何影响未来的体验，以及如何改变政策以最大化未来的回报。第二，有一个神经科学原理，它指出“新皮层模块的处理功能在所有新皮层区域中是定性相似的......运动皮层没有内在的运动，感觉皮层也没有内在的感觉。“[13]。因此，如果发现以对象为中心的归纳偏差对视频预测有用，那么类似的归纳偏差在策略表示中也可能有用。将这种策略表示与分层强化学习中的现有工作进行比较将是有趣的[17]，并探索这种表示是否可以提高强化学习中的样本效率以及对分布外环境的鲁棒性。研究方法的新奇据我们所知，将时间预测编码与以对象为中心的学习相结合以前还没有被探索过。探索这种结合将为机器学习提供有价值的贡献和见解，同时通过在算法层面上更接近人类智能的理解，为认知科学提供价值。与EPSRC的战略和研究领域一致这项研究提案与计算机智能和机器人主题，人工智能技术以及图像和视觉计算领域一致。