权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Towards a Compositional Generative Model of Human Vision

迈向人类视觉的组合生成模型

基本信息

批准号：
10458624
负责人：
DANIEL J KERSTEN
金额：
$ 32.83万
依托单位：
UNIVERSITY OF MINNESOTA
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-30 至 2024-07-31
项目状态：
已结题

项目摘要

Understanding object recognition has long been a central problem in vision science, because of its applied utility and computational difficulty. Progress has been slow, because of an inability to process complex natural images, where the largest challenges arise. Recently, advances in Deep Convolutional Neural Networks (DCNNs) spurred unprecedented success in natural image recognition. The general goal of this proposal is to leverage this success to test computational theories of human object recognition in natural images. However, DCNNs still markedly underperform humans when challenged with high levels of ambiguity, occlusion, and articulation. We hypothesize that humans' superior performance arises from the use of knowledge about how images and objects are structured. Preliminary evidence for this claim comes from the success of hybrid models, that combine DCNNS for identifying features and parts in images, with explicit knowledge of object and image structure. These computations occur within a hierarchy, which includes both top-down and bottom- up processing. The specific goal of the work proposed here is to strongly test whether these computational strategies, structured, hierarchical representations and bidirectional processing, are used to recognize objects in natural images. Human bodies are composed of hierarchically organized configurable parts, making them an ideal test domain. We examine the complete recognition process, from parts, to pairs of parts, to whole bodies, each in its own aim. Each aim also tests important sub-hypotheses about when and how the computational strategies are used. Aim 1 examines recognition of individual body parts, testing whether it is dependent on parsing images into more basic features and relationships, for example edges and materials. Aim 2 examines pairs of parts, testing the importance of knowledge of body connectedness relationships. Aim 3 examines perception of entire bodies, testing whether knowledge of global body structure guides bidirectional processing. In each aim, we first develop nested computer vision models that either do or do not make use of structural knowledge, to test whether it aids recognition. We then test whether human performance can be accounted for by the availability of that structural knowledge. We next measure neural activity with functional MRI to identify where and how it is used in cortex. Finally, we integrate these results to produce even stronger tests, using the nested models to predict human performance and confusion matrices as well as fMRI activity levels and confusion matrices. Altogether, this work will strongly test key theoretical accounts of object recognition in the most important domain, perception of natural images. The work, based on extensive preliminary data, measures and models the entire body recognition system. The models developed and tested here should surpass the state-of-the-art, and be useful for many real-world recognition tasks. The proposal will also lay the groundwork for future studies of recognition impaired by disease.

由于物体识别的实用性，理解物体识别一直是视觉科学的中心问题实用性和计算难度。进展缓慢，因为无法处理复杂的自然图像，这是最大挑战出现的地方。近年来，深卷积神经网络的研究进展 (DCNN)在自然图像识别方面取得了前所未有的成功。这项提议的总体目标是利用这一成功来测试自然图像中人类对象识别的计算理论。然而，当面临高水平的模糊性、遮挡性和发音。我们假设，人类的卓越表现源于对以下知识的使用图像和对象是结构化的。这种说法的初步证据来自混合动力车的成功模型，将用于识别图像中的特征和部件的DCNN与对象的明确知识相结合和图像结构。这些计算发生在一个层次结构中，该层次结构包括自上而下和自下而上- 正在处理中。这里提出的工作的具体目标是强烈地测试这些计算使用结构化、层次表示和双向处理等策略来识别对象在自然图像中。人体由按层次组织的可配置部分组成，使它们成为理想的测试域。我们检查完整的识别过程，从部件，到成对的部件，再到整个身体，每个人都有自己的目标。每个目标还测试关于计算的时间和方式的重要的子假设使用了策略。目标1检查个人身体部位的识别，测试它是否依赖于将图像解析为更基本的特征和关系，例如边缘和材料。AIM 2考试成对的部件，测试身体连通性关系知识的重要性。AIM 3考试感知整个身体，测试对整体身体结构的知识是否指导双向加工。在每个目标中，我们首先开发嵌套的计算机视觉模型，这些模型要么使用结构，要么不使用结构知识，以测试它是否有助于识别。然后，我们测试人类的表现是否可以被解释为结构知识的可获得性。接下来，我们用功能核磁共振测量神经活动，以确定它在大脑皮层中的位置和使用方式。最后，我们集成这些结果以生成更强大的测试，使用用于预测人类表现和混淆矩阵以及fMRI活动水平的嵌套模型混淆矩阵。总之，这项工作将有力地检验物体识别的关键理论解释最重要的领域是对自然图像的感知。这项工作基于广泛的初步数据，对整个身体识别系统进行测量和建模。这里开发和测试的模型应该超越了最先进的技术，并适用于许多现实世界的识别任务。该提案还将为为未来疾病损害认知能力的研究奠定基础。