权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Making Machine Learning on Static and Dynamic 3D Data Practical

使基于静态和动态 3D 数据的机器学习变得实用

基本信息

批准号：
405799936
负责人：
Professor Dr.-Ing. Matthias Nießner
金额：
--
依托单位：
Lehrstuhl für Informatik XV: Graphik und Visualisierung
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2019
资助国家：
德国
起止时间：
2018-12-31 至 2022-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/405799936?language=en
关键词：
Making Machine Learning Static Dynamic

项目摘要

In the last five years, advances in deep learning have led to significant progress in allowing computers to understand the real world from visual input, thus opening up many opportunities ranging from robotics to virtual and augmented reality, as well as medical and industry 4.0 applications. Most of these machine learning architectures are convolutional neural networks (CNNs), which are able to learn powerful features from images, and even generate highly-realistic pictures from scratch using generative adversarial networks (GANs). In the 2D image domain, we have seen tremendous success in both discriminative and generative tasks.Unfortunately, for 3D data, e.g. data obtained from 3D scans on autonomous cars, research is only at the infancy. This 3D direction requires further exploration, as our world is inherently three-dimensional (e.g. humans see with two eyes), and even four-dimensional when considering the temporal domain. In fact, performing scene understanding in 3D has significant advantages; for instance, a machine learning approach does not need to learn viewpoint invariance, and thus requires less training data. However, the additional third dimension (and fourth for dynamics) comes at significant computational and memory overhead, which has so far been the major bottleneck in these applications.In this proposal, we address this shortcoming by developing efficient machine learning algorithms for 3D and 4D data analysis. In particular, we will develop deep learning architectures and training methods capable of efficiently modeling different types of static and dynamic 3D data representations, including sparse sparse spatial and temporal representations on voxel volumes, RGB-D images, point clouds, multi-view images, and meshes. We will further construct new datasets designed for our scenario, captured from the real-world, as well as synthetically generated with simulated renderings, augmented to reduce the reality gap between artificial and real data. Finally, we will develop new neural network architectures designed for discriminative and generative applications embedded in spatial and specifically temporal domains. In order to showcase our learning methods, we will apply them to static and dynamic 3D reconstruction tasks, as well as semantic scene understanding in 3D and 4D with an emphasis on fusing the spatial and temporal domains.

在过去的五年里，深度学习的进步在允许计算机从视觉输入理解真实的世界方面取得了重大进展，从而开辟了从机器人到虚拟和增强现实以及医疗和工业4.0应用的许多机会。大多数机器学习架构都是卷积神经网络（CNN），它们能够从图像中学习强大的特征，甚至使用生成对抗网络（GAN）从头开始生成高度真实的图片。在2D图像领域，我们已经在判别和生成任务方面取得了巨大的成功。不幸的是，对于3D数据，例如从自动汽车上的3D扫描获得的数据，研究还处于起步阶段。这个3D方向需要进一步探索，因为我们的世界本质上是三维的（例如人类用两只眼睛看），甚至在考虑时间域时是四维的。事实上，在3D中执行场景理解具有显着的优势;例如，机器学习方法不需要学习视角不变性，因此需要更少的训练数据。然而，额外的第三个维度（和第四个动力学）来在显着的计算和内存开销，这一直是迄今为止在这些应用中的主要瓶颈。在这个建议中，我们通过开发高效的机器学习算法来解决这个缺点3D和4D数据分析。特别是，我们将开发能够有效建模不同类型的静态和动态3D数据表示的深度学习架构和训练方法，包括体素体积，RGB-D图像，点云，多视图图像和网格的稀疏空间和时间表示。我们将进一步构建为我们的场景设计的新数据集，从现实世界中捕获，以及用模拟渲染合成生成，增强以减少人工和真实的数据之间的现实差距。最后，我们将开发新的神经网络架构，用于嵌入空间和特定时间域的判别和生成应用。为了展示我们的学习方法，我们将把它们应用于静态和动态3D重建任务，以及3D和4D中的语义场景理解，重点是融合空间和时间域。