权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: NCS-FR: Beyond the ventral stream: Reverse engineering the neurocomputational basis of physical scene understanding in the primate brain

合作研究：NCS-FR：超越腹侧流：逆向工程灵长类大脑中物理场景理解的神经计算基础

基本信息

批准号：
2123963
负责人：
Daniel Yamins
金额：
$ 75万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2123963&HistoricalAwards=false
关键词：
Collaborative Research NCS FR Beyond

项目摘要

The last ten years have witnessed an astonishing revolution in AI, with deep neural networks suddenly approaching human-level performance on problems like recognizing objects in an image and words in an audio recording. But impressive as these feats are, they fall far short of human-like intelligence. The critical gap between current AI and human intelligence is that, beyond just classifying patterns of input, humans build mental models of the world. This project begins with the problem of physical scene understanding: how one extracts not just the identities and locations of objects in the visual world, but also the physical properties of those objects, their positions and velocities, their relationships to each other, the forces acting upon them, and the effects of forces that could be exerted on them. It is hypothesized that humans represent this information in a structured mental model of the physical world, and use that model to predict what will happen next, much as the physics engine in a video game generates physically plausible future states of virtual worlds. To test this idea, computational models of physical scene understanding will be built and tested for their ability to predict future states of the physical world in a variety of scenarios. Performance of these models will then be compared to humans and to more traditional deep network models, both in terms of their accuracy on each task, and their patterns of errors. Computational models that incorporate structured representations of the physical world will then be tested against standard convolutional neural networks in their ability to explain neural responses of the human brain (using fMRI) and the monkey brain (using direct neural recording). These computational models will provide the first explicit theories of how physical scene understanding might work in the human brain, at the same time advancing the ability of AI systems to solve the same problems. Because the ability to understand and predict the physical world is essential for planning any action, this work is expected to help advance many technologies that require such planning, from robotics to self-driving cars to brain-machine interfaces. Each of the participating labs will also expand their established track records of recruiting, training, and mentoring women and under-represented minorities at the undergraduate, graduate, and postdoctoral levels. Finally, the collaborating laboratories will continue and increase their involvement in the dissemination of science to the general public, via public talks, web sites, and outreach activities.Deep neural networks have revolutionized object recognition in computers as well as understanding of object recognition in the primate brain, but object recognition is just one aspect of vision, and the ventral stream is just one of many brain systems. Studying physical scene understanding is a step toward scaling this reverse-engineering approach up to the rest of the mind and brain. Predicting what will happen next and planning effective action requires understanding the physical basis and physical relationships in the visual world. Yet it is unknown how humans do this or how machines could. Both challenges are addressed in this project by the building of image computable, neurally mappable computational models of physical scene understanding and prediction (Thread I), and using these models as explicit hypotheses for how the brain might accomplish these tasks, which will then be tested with behavioral and neural data from humans (Thread II) and non-human primates (Thread III). This project aims to make a transformative leap in understanding: from small-scale, special-case models and isolated experimental tests to an integrated large-scale, general-purpose model of a major swathe of the primate brain, that functionally explains much of the immediate content of our perceptual experience in every scene that confronts us. The work will advance theory by developing the first image-computable models capable of human-level physical scene understanding and prediction. Beyond understanding of the mind and brain, this research is directly relevant to AI and robotics (which require physical scene understanding), and brain-machine interfaces (which require understanding of the relevant neural codes). For the broader research community, the project will a) develop public datasets, benchmark tasks, and challenges, b) host adversarial collaborations to address these challenges, and c) host interdisciplinary workshops linking research communities from psychology to AI to neuroscience to address the fundamental questions that span these fields.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在过去的十年里，人工智能发生了惊人的革命，深度神经网络在识别图像中的物体和录音中的文字等问题上突然接近人类水平的性能。但是，尽管这些壮举令人印象深刻，但它们远远达不到人类的智能。当前人工智能和人类智能之间的关键差距在于，除了对输入模式进行分类之外，人类还建立了世界的心理模型。该项目从物理场景理解的问题开始：如何提取不仅是视觉世界中的对象的身份和位置，而且这些对象的物理属性，它们的位置和速度，它们之间的关系，作用在它们身上的力，以及可能施加在它们身上的力的效果。假设人类在物理世界的结构化心理模型中表示这些信息，并使用该模型来预测接下来会发生什么，就像视频游戏中的物理引擎生成虚拟世界的物理上合理的未来状态一样。为了测试这一想法，将建立物理场景理解的计算模型，并测试它们在各种场景中预测物理世界未来状态的能力。然后，这些模型的性能将与人类和更传统的深度网络模型进行比较，无论是在每个任务的准确性方面，还是在错误模式方面。然后，将结合物理世界的结构化表示的计算模型与标准卷积神经网络进行测试，以测试它们解释人脑（使用fMRI）和猴脑（使用直接神经记录）的神经反应的能力。这些计算模型将为物理场景理解如何在人脑中工作提供第一个明确的理论，同时提高人工智能系统解决相同问题的能力。由于理解和预测物理世界的能力对于规划任何行动都至关重要，因此这项工作有望帮助推进许多需要这种规划的技术，从机器人技术到自动驾驶汽车再到脑机接口。每个参与实验室还将扩大其在本科生，研究生和博士后水平上招募，培训和指导女性和代表性不足的少数民族的既定记录。最后，合作实验室将继续并增加他们通过公开讲座、网站和外展活动向公众传播科学的参与。深度神经网络已经彻底改变了计算机中的物体识别以及对灵长类大脑中物体识别的理解，但物体识别只是视觉的一个方面，腹侧流只是许多大脑系统之一。研究物理场景理解是将这种逆向工程方法扩展到思想和大脑的其他部分的一步。预测接下来会发生什么并计划有效的行动需要理解视觉世界中的物理基础和物理关系。然而，我们不知道人类是如何做到这一点的，也不知道机器是如何做到的。这两个挑战都在这个项目中通过建立物理场景理解和预测的图像可计算，神经可映射的计算模型来解决（线程I），并使用这些模型作为大脑如何完成这些任务的明确假设，然后将使用来自人类（线程II）和非人类灵长类动物（线程III）的行为和神经数据进行测试。该项目旨在实现理解的变革性飞跃：从小规模的特殊情况模型和孤立的实验测试到灵长类动物大脑主要区域的综合大规模通用模型，该模型在功能上解释了我们面对的每个场景中感知体验的大部分即时内容。这项工作将通过开发第一个能够理解和预测人类物理场景的图像可计算模型来推进理论。除了对心灵和大脑的理解之外，这项研究还与人工智能和机器人技术（需要理解物理场景）以及脑机接口（需要理解相关的神经代码）直接相关。对于更广泛的研究社区，该项目将a）开发公共数据集，基准任务和挑战，B）主持对抗性合作以应对这些挑战，以及c）主办跨学科研讨会，将研究社区从心理学到人工智能再到神经科学，以解决跨越这些领域的基本问题。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的学术价值和更广泛的影响审查标准。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Physion: Evaluating Physical Prediction from Vision in Humans and Machines

DOI：
发表时间：
2021-06
期刊：
ArXiv
影响因子：
0
作者：
Daniel Bear;E. Wang;Damian Mrowca;Felix Binder;Hsiau-Yu Fish Tung;R. Pramod;Cameron Holdaway;Sirui Tao;Kevin A. Smith;Li Fei-Fei-Li-Fei-Fei-48004138;N. Kanwisher;J. Tenenbaum;Daniel Yamins;Judith E. Fan
通讯作者：
Daniel Bear;E. Wang;Damian Mrowca;Felix Binder;Hsiau-Yu Fish Tung;R. Pramod;Cameron Holdaway;Sirui Tao;Kevin A. Smith;Li Fei-Fei-Li-Fei-Fei-48004138;N. Kanwisher;J. Tenenbaum;Daniel Yamins;Judith E. Fan

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

DOI：
发表时间：
2020-07
期刊：
ArXiv
影响因子：
0
作者：
Chuang Gan;Jeremy Schwartz;S. Alter;Martin Schrimpf;James Traer;Julian De Freitas;J. Kubilius;Abhishek Bhandwaldar;Nick Haber;Megumi Sano;Kuno Kim;E. Wang;Damian Mrowca;Michael Lingelbach;Aidan Curtis;Kevin T. Feigelis;Daniel Bear;Dan Gutfreund;David Cox;J. DiCarlo;Josh H. McDermott;J. Tenenbaum;Daniel L. K. Yamins
通讯作者：
Chuang Gan;Jeremy Schwartz;S. Alter;Martin Schrimpf;James Traer;Julian De Freitas;J. Kubilius;Abhishek Bhandwaldar;Nick Haber;Megumi Sano;Kuno Kim;E. Wang;Damian Mrowca;Michael Lingelbach;Aidan Curtis;Kevin T. Feigelis;Daniel Bear;Dan Gutfreund;David Cox;J. DiCarlo;Josh H. McDermott;J. Tenenbaum;Daniel L. K. Yamins

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Daniel Yamins其他文献

Dynamic Task Assignment in Robot Swarms

机器人群中的动态任务分配

DOI：
发表时间：
2005
期刊：
Robotics: Science and Systems
影响因子：
0
作者：
James McLurkin;Daniel Yamins
通讯作者：
Daniel Yamins

FAR: End-to-End Vibrotactile Distributed System Designed to Facilitate Affect Regulation in Children Diagnosed with Autism Spectrum Disorder Through Slow Breathing

FAR：端到端振动触觉分布式系统，旨在通过缓慢呼吸促进被诊断患有自闭症谱系障碍的儿童的情绪调节

DOI：
10.1145/3491102.3517619
发表时间：
2022
期刊：
Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
影响因子：
0
作者：
Pardis Miri;Mehul Arora;Aman Malhotra;R. Flory;Stephanie Hu;Ashley Lowber;Ishan Goyal;Jacqueline Nguyen;J. Hegarty;Marlo D Kohn;David Schneider;Heather Culbertson;Daniel Yamins;Lawrence K. Fung;A. Hardan;J. Gross;Keith Marzullo
通讯作者：
Keith Marzullo

跨图像模态映射视觉对象之间的核心相似性

DOI：
发表时间：
2014
期刊：
International Conference on Computer Graphics and Interactive Techniques
影响因子：
0
作者：
Judith E. Fan;Daniel Yamins;J. DiCarlo;N. Turk
通讯作者：
N. Turk

The BabyView camera: Designing a new head-mounted camera to capture children's early social and visual environments.

BabyView 相机：设计一款新型头戴式相机，用于捕捉儿童早期的社交和视觉环境。

DOI：
10.3758/s13428-023-02206-1
发表时间：
2023
期刊：
Behavior research methods
影响因子：
5.4
作者：
Bria L Long;Sarah Goodin;George Kachergis;V. Marchman;Samaher F. Radwan;Robert Z Sparks;Violet Xiang;Chengxu Zhuang;Oliver Hsu;Brett Newman;Daniel Yamins;Michael C. Frank
通讯作者：
Michael C. Frank