权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Shape Perception in Computer Vision

计算机视觉中的形状感知

基本信息

批准号：
RGPIN-2022-03366
负责人：
Dickinson, Sven
金额：
$ 2.55万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750027
关键词：
Shape Perception Computer Vision

项目摘要

Consider the task of modeling a cup, so that a computer vison system can recognize any cup placed in front of a camera. How do we model the cup in terms of its parts? How do we recover those parts from an image? How do we determine if an unusual object we've never seen before can be used as a cup? How do we utilize the power of deep learning to perform these tasks? And finally, how can our remarkable human vision system inform the design of these deep learning solutions? Shedding light on these fundamental questions in shape perception could have a major impact on such downstream tasks as object representation, recognition, and manipulation. In our recent work, we demonstrated the important role that local symmetry plays in human shape perception, and showed that the performance of a standard deep learning architecture can be improved when the input is augmented with local symmetry information, i.e., the framework cannot compute this information on its own. We extend that work in two important ways: 1) we will develop a framework that can compute this valuable information; and 2) we will embed this module in a larger end-to-end, deep learning framework for recognition tasks, encouraging a more modular recognition pipeline that's more interpretable, and respects the important role that symmetry plays in human vision. Our second project addresses the problem of 3-D shape representation learning. Specifically, for a given a set of 3-D images of objects belonging to a category, how do we learn a representation for that class of objects that can be used to generate/recognize new examples belonging to the class? In our previous work, we learned to tease apart, or "disentangle", the learned representation to differentiate between extrinsic variations in the object due to articulation (e.g., the different configurations of a particular tiger's legs) and intrinsic variations in the object due to within-class variation (e.g., the variations in shape across the family of big cats). We will extend that work in representation learning to learn compositional models of objects in terms of their natural part structure, a concept deeply rooted in both human and conputer vision. In our third project, we move beyond learning purely geometric representations of shape to learn more abstract representations that take into account how the object will be used, i.e., its affordances. Returning to our cup example, a pot could be used as a cup, but would never be categorized as a cup in any modern-day recognition system. It's not the precise geometry of the cup that affords the task of drinking, but the fact that the object affords both containment and a handle with which to tilt the container to pour the liquid. We extend our previous work in learning the mapping between shape and affordance to leverage the power of physics-based simulators and new shape representations in order to learn how to grasp an object in order to facilitate a particular task.

考虑一个杯子建模的任务，这样计算机视觉系统就可以识别放在摄像机前的任何杯子。我们如何根据杯子的各个部分来建模？我们如何从图像中恢复这些部分？我们如何确定一个我们从未见过的不寻常的物体是否可以用作杯子？我们如何利用深度学习的力量来执行这些任务？最后，我们卓越的人类视觉系统如何为这些深度学习解决方案的设计提供信息？揭示形状感知中的这些基本问题可能会对物体表示、识别和操纵等下游任务产生重大影响。在我们最近的工作中，我们证明了局部对称性在人类形状感知中的重要作用，并表明当输入增加局部对称信息时，标准深度学习架构的性能可以得到改善，即，框架不能自己计算该信息。我们将以两种重要的方式扩展这项工作：1）我们将开发一个可以计算这些有价值信息的框架; 2）我们将把这个模块嵌入到一个更大的端到端的深度学习框架中，用于识别任务，鼓励一个更模块化的识别管道，更可解释，并尊重对称性在人类视觉中的重要作用。我们的第二个项目解决了3-D形状表示学习的问题。具体来说，对于给定的属于某个类别的对象的一组3D图像，我们如何学习该类对象的表示，以用于生成/识别属于该类的新示例？在我们以前的工作中，我们学会了梳理，或“解开”，学习的表示，以区分由于发音对象的外在变化（例如，特定老虎的腿的不同配置）和由于类内变化引起的对象的固有变化（例如，大型猫科动物家族中形状的变化）。我们将在表示学习中扩展这项工作，以学习对象的自然部分结构的组成模型，这是一个深深植根于人类和计算机视觉的概念。在我们的第三个项目中，我们超越了学习纯粹的几何形状表示，学习更抽象的表示，考虑到对象将如何使用，即，它的启示。回到我们的杯子例子，壶可以被用作杯子，但在任何现代识别系统中都不会被归类为杯子。并不是杯子的精确几何形状提供了饮用的任务，而是事实上，这个物体既提供了容纳空间，又提供了一个手柄，可以倾斜容器来倒出液体。我们扩展了我们以前在学习形状和启示之间的映射方面的工作，以利用基于物理的模拟器和新的形状表示的力量，以学习如何抓住物体，以促进特定的任务。