权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding

合作研究：RI：媒介：学习 3D 场景理解的组合隐式表示

基本信息

批准号：
2211260
负责人：
Vincent Sitzmann
金额：
$ 40万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2211260&HistoricalAwards=false
关键词：
Collaborative Research RI Medium Learning

项目摘要

Scene understanding systems take visual inputs, like images or videos, and reconstruct and interpret the underlying scene in terms of 3D structure, objects like cars and people, and other scene properties. Such systems are crucial in applications in computer vision, computer graphics, and robotics, including in self-driving cars. To represent the 3D world as observed from the input imagery, such systems use mathematical models, and in recent years neural networks have been very popular as the models used in such systems, due to their expressiveness and ability to capture fine details. However, current neural network-based scene representations are only good at modeling the specific conditions under which a scene was observed, and cannot generalize to new scenarios, limiting their use in many applications. For example, if a self-driving car is trained to model scenes using only images from sunny days, the car’s perception system might break down on rainy or snowy days. This project aims to introduce new scene modeling techniques that will enable machines to perceive and reconstruct 3D scenes in a more generalizable way. The investigators will integrate findings from this research into course development and student advising, and partner with educational and non-profit organizations to teach AI, vision, and graphics to underrepresented students. In this project, investigators will explore new methods that will make representations capable of encoding more structure (e.g., light field) and root them in physics. Designing such representations requires knowledge from AI, computer vision, and computer graphics. The key innovations include a new class of scene representations that aims to bridge the ability of implicit neural representations to capture scene details with that of physical representations to model scene structure; new methods that infer the representation from raw images and videos with new parametrizations to enable data-efficient, self-supervised learning; and new methods that leverage the representation for downstream computer vision and graphics tasks, such as interactive design and scene synthesis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

场景理解系统获取视觉输入，如图像或视频，并根据3D结构、物体（如汽车和人）以及其他场景属性重建和解释底层场景。这类系统在计算机视觉、计算机图形学和机器人技术（包括自动驾驶汽车）中的应用至关重要。为了表示从输入图像观察到的3D世界，这样的系统使用数学模型，并且近年来神经网络由于其表现力和捕捉精细细节的能力而作为在这样的系统中使用的模型已经非常流行。然而，目前基于神经网络的场景表示仅擅长于对观察场景的特定条件进行建模，并且不能推广到新的场景，从而限制了它们在许多应用中的使用。例如，如果自动驾驶汽车接受训练，仅使用晴天的图像对场景进行建模，那么汽车的感知系统可能会在雨天或雪天发生故障。该项目旨在引入新的场景建模技术，使机器能够以更普遍的方式感知和重建3D场景。研究人员将把这项研究的结果整合到课程开发和学生咨询中，并与教育和非营利组织合作，向代表性不足的学生教授人工智能、视觉和图形。在这个项目中，研究人员将探索新的方法，使表征能够编码更多的结构（例如，光场），并将其扎根于物理学。设计这样的表示需要人工智能，计算机视觉和计算机图形学的知识。关键创新包括一类新的场景表示，旨在将隐式神经表示捕捉场景细节的能力与物理表示建模场景结构的能力联系起来;新方法，通过新的参数化从原始图像和视频中推断表示，以实现数据高效，自我监督学习;以及利用该表示进行下游计算机视觉和图形任务的新方法，如交互设计和场景合成。该奖项反映了NSF的法定使命，并通过使用基金会的学术价值和更广泛的影响审查标准。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Unsupervised Discovery and Composition of Object Light Fields

DOI：
10.48550/arxiv.2205.03923
发表时间：
2022-05
期刊：
ArXiv
影响因子：
0
作者：
Cameron Smith;Hong-Xing Yu;Sergey Zakharov;F. Durand;J. Tenenbaum;Jiajun Wu;V. Sitzmann
通讯作者：
Cameron Smith;Hong-Xing Yu;Sergey Zakharov;F. Durand;J. Tenenbaum;Jiajun Wu;V. Sitzmann

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

DOI：
发表时间：
2022-07
期刊：
影响因子：
0
作者：
Prafull Sharma;A. Tewari;Yilun Du;Sergey Zakharov;Rares Ambrus;Adrien Gaidon;W. Freeman;F. Durand;J. Tenenbaum;V. Sitzmann
通讯作者：
Prafull Sharma;A. Tewari;Yilun Du;Sergey Zakharov;Rares Ambrus;Adrien Gaidon;W. Freeman;F. Durand;J. Tenenbaum;V. Sitzmann

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

DOI：
10.1109/cvpr52729.2023.00481
发表时间：
2023-04
期刊：
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
影响因子：
0
作者：
Yilun Du;Cameron Smith;A. Tewari;V. Sitzmann
通讯作者：
Yilun Du;Cameron Smith;A. Tewari;V. Sitzmann

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Vincent Sitzmann其他文献

Controlling diverse robots by inferring Jacobian fields with deep networks

通过深度网络推断雅可比场来控制不同的机器人

DOI：
10.1038/s41586-025-09170-0
发表时间：
2025-06-25
期刊：
NATURE
影响因子：
48.500
作者：
Sizhe Lester Li;Annan Zhang;Boyuan Chen;Hanna Matusik;Chao Liu;Daniela Rus;Vincent Sitzmann
通讯作者：
Vincent Sitzmann