权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Dense Monocular Reconstruction and Semantic Segmentation of 3D Environments

3D 环境的密集单目重建和语义分割

基本信息

批准号：
2116531
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2018
资助国家：
英国
起止时间：
2018 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2116531
关键词：
Dense Monocular Reconstruction Semantic Segmentation

项目摘要

Most applications of robotics and augmented reality (AR) rely on some form of a 3D model of the environment in which they operate in order to interact with the environment. At the most basic level, these models provide information about where things are in the environment, allowing systems to accurately and safely interact with the environment. Conventionally there has been a trade-off between the quality and the density of the model (i.e. accurate models typically consist of a cloud of points with no information about the region between the points). This sparseness typically arises from the use of 3D scanners, such as lasers. There are scanners that are able to produce dense information, however, these scanners are either limited to indoors operation, or have an extremely short range (typically 1-5m). If more meaningful interaction is desired then semantic information, which describes the contents of the environment, is required.The ability to quickly generate accurate models of 3D spaces has a variety of impacts, from increasing the ease with which automated systems can navigate and explore the world, to improving the interaction of visuals generated by AR with the real world. Further, there are potential applications for disability and access, whereby buildings or areas can be easily and quickly mapped, and then scale models printed, to aid those with impaired vision to navigate ares. Semantic information allows systems to understand the world and answer questions about it. For example, with semantic information, we can ask questions like "Where are the chairs in this room?"We aim to investigate systems for both reconstructing dense 3D models of environments as well as generating semantic segmentations of those models. We hope to be able to develop a system that is capable of generating these models in real time, and ultimately on embedded and mobile devices, such as an iPhone. Further, we aim to be able to generate these models in places where current sensor based systems cannot, i.e. outdoors and over ranges greater than 5m. To improve on the existing techniques for reconstructing 3D models from monocular images, we plan to utilize convolutional neural networks (CNNs), alongside existing geometric methods. However, rather than computing a depth image, we propose to directly compute a full 3D model from the network. Although this process requires significantly more memory, we believe that this will allow for better integration of the available information. We suspect that the direct use of 3D information will be of particular importance for semantic segmentation, where certain viewing angles of objects can be misleading (e.g. a chair from above looks a lot like a table). We also plan to investigate the potential of recurrent neural networks to improve the quality of the reconstructions over a sequence of images, as this input pipeline mimics those that you would most likely see in real world data acquisition scenarios.This project falls within the EPSRC Information and Communication Technologies theme, specifically the Image and Vision Computing research area.

机器人和增强现实（AR）的大多数应用都依赖于某种形式的环境的3D模型，它们在其中操作，以便与环境进行交互。在最基本的层面上，这些模型提供了关于事物在环境中位置的信息，允许系统准确安全地与环境交互。传统上，在模型的质量和密度之间存在折衷（即，精确模型通常由点云组成，而没有关于点之间的区域的信息）。这种稀疏性通常是由于使用3D扫描仪，如激光。有些扫描仪能够产生密集的信息，然而，这些扫描仪要么限于室内操作，要么具有极短的范围（通常为1- 5米）。如果需要更有意义的交互，则需要描述环境内容的语义信息。快速生成精确的3D空间模型的能力具有各种影响，从增加自动化系统导航和探索世界的轻松性，到改善AR生成的视觉效果与真实的世界的交互。此外，还有针对残疾和无障碍的潜在应用，可以轻松快速地绘制建筑物或区域的地图，然后打印缩放模型，以帮助视力受损的人在战神中导航。语义信息使系统能够理解世界并回答有关世界的问题。例如，有了语义信息，我们可以问“这个房间里的椅子在哪里？“我们的目标是研究用于重建环境的密集3D模型以及生成这些模型的语义分割的系统。我们希望能够开发出一个能够在真实的时间内生成这些模型的系统，并最终在嵌入式和移动的设备上，如iPhone。此外，我们的目标是能够在当前基于传感器的系统无法生成这些模型的地方，即户外和超过5米的范围。为了改进现有的从单目图像重建3D模型的技术，我们计划利用卷积神经网络（CNN）以及现有的几何方法。然而，我们建议直接从网络中计算完整的3D模型，而不是计算深度图像。虽然这一过程需要更多的内存，但我们相信这将有助于更好地整合现有信息。我们怀疑直接使用3D信息对于语义分割特别重要，因为对象的某些视角可能会产生误导（例如，从上面看椅子很像桌子）。我们还计划研究递归神经网络的潜力，以提高图像序列的重建质量，因为这个输入管道模仿那些你最有可能在真实的世界的数据采集scenaries.This项目的福尔斯落在EPSRC信息和通信技术的主题，特别是图像和视觉计算研究领域。