权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: Small: DA: DCM: Labeling the World

大数据：小： DA： DCM：标记世界

基本信息

批准号：
1250793
负责人：
Steven Seitz
金额：
$ 75万
依托单位：
University of Washington
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2013
资助国家：
美国
起止时间：
2013-04-01 至 2017-03-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1250793&HistoricalAwards=false
关键词：
BIGDATA Small DA DCM Labeling

项目摘要

The project aims to leverage the massive corpus of online photos, text, and maps to create a semantic 3D labeled model of the world, e.g., detailed representations of the world's top cultural and historical sites. While breakthroughs in computer vision enable creating detailed 3D models from millions of online 2D images, the resulting models capture only geometry. Consequently, they lack semantics; they don't provide information about the contents of the scene. The vast treasure trove of online text such as Wikipedia meticulously catalogs the scenes that are captured in photos and models. Modern Natural Language Processing (NLP) techniques can now process such data, opening up the opportunity to extract knowledge from the online text corpus and use it to label 3D geometry. This project seeks to jointly analyze the massive corpus of online text, maps, and photos to create labeled 3D models of the world's sites. Achieving this goal will require fundamental research advances at the interface of natural language processing and computer vision that impact both the scientific research community and the world at large. The project addresses two key technical challenges: (1) automatic scene labeling: mapping semantics onto geometry, and (2) solving the 3D jigsaw puzzle: mapping pieces of geometry into the world. Many clues to these mapping problems lie in the text and other online datasources such as floorplans. Other clues lie in the content of the photos. Decoding this mapping therefore involves an interplay between NLP and computer vision. The key research advances center around new ways to jointly leverage computer vision and NLP to solve problems to solve challenging problems in both fields, specifically, 1) recognizing objects through joint NLP and 3D visual analysis, 2) placing objects in the world by correlating geometry with spatial text in maps and webpages, and 3) using semantics to improve geometry by augmenting visual cues with textual spatial relations.Broader Impacts: The primary research outcomes are: (1) technology for creating labeled 3D models at a massive scale, and (2) labeled models for many top tourist sites. Both the algorithms and models will be made freely available for the research community. These algorithms and models will provide the foundation for a range of exciting applications of major practical impact on the world at large. The resulting tools could make it possible for resources such as Wikipedia to link the text directly to 3D models and vice-versa, with attendant benefits to online learning and education. The same technology could enable automated labeling of 2D photographs. In the context of real-time applications (e.g., augmented reality), the technology could provide visual overlays and instant feedback on what you are currently looking at, and enable augmented reality-style guided tours. Other applications include using labeled geometry for navigation (walking directions), and converting images to text for the visually impaired. The research is tightly integrated into education and training of students at the University of Washington. Additional information about the project can be found at: http://grail.cs.washington.edu/projects/label3d/

该项目旨在利用在线照片、文本和地图的海量语料库来创建世界的语义3D标签模型，例如，世界顶级文化和历史遗址的详细表示。虽然计算机视觉的突破使人们能够从数以百万计的在线2D图像中创建详细的3D模型，但生成的模型只捕捉几何图形。因此，它们缺乏语义；它们不提供有关场景内容的信息。维基百科等庞大的在线文本宝库精心编目了照片和模型中捕捉到的场景。现代自然语言处理(NLP)技术现在可以处理这些数据，从而打开了从在线文本语料库中提取知识并使用它来标记3D几何图形的机会。这个项目寻求联合分析在线文本、地图和照片的海量语料库，以创建世界各地网站的标记3D模型。实现这一目标将需要在自然语言处理和计算机视觉的接口上取得基础研究进展，这将对科学研究界和整个世界产生影响。该项目解决了两个关键的技术挑战：(1)自动场景标记：将语义映射到几何体上，以及(2)解决3D拼图：将几何体的碎片映射到世界上。这些地图问题的许多线索存在于文本和其他在线数据来源中，如平面图。其他线索存在于照片的内容中。因此，对这种映射进行解码涉及到NLP和计算机视觉之间的相互作用。主要的研究进展围绕着联合利用计算机视觉和自然语言处理来解决问题的新方法来解决这两个领域的挑战性问题，具体地说，1)通过联合NLP和3D视觉分析来识别对象，2)通过将几何与地图和网页中的空间文本相关联来放置对象，3)通过使用语义来增强视觉线索以增强文本空间关系来改善几何。广泛的影响：主要的研究成果是：(1)大规模创建标记的3D模型的技术，以及(2)许多顶级旅游景点的标记模型。算法和模型都将免费提供给研究社区。这些算法和模型将为对整个世界产生重大实际影响的一系列令人兴奋的应用程序提供基础。由此产生的工具可能使维基百科等资源有可能将文本直接链接到3D模型，反之亦然，从而带来在线学习和教育的好处。同样的技术可以实现2D照片的自动标记。在实时应用程序(例如，增强现实)的背景下，该技术可以提供关于您当前正在观看的内容的可视覆盖和即时反馈，并支持增强现实风格的导游。其他应用包括使用带标签的几何图形进行导航(步行方向)，以及为视障人士将图像转换为文本。这项研究与华盛顿大学学生的教育和培训紧密结合在一起。有关该项目的更多信息，请访问：http://grail.cs.washington.edu/projects/label3d/