权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Text-to-Image Reference Resolution for Image Understanding and Manipulation

RI：媒介：协作研究：用于图像理解和操作的文本到图像参考分辨率

基本信息

批准号：
1562098
负责人：
Mohit Bansal
金额：
$ 27.5万
依托单位：
University of North Carolina at Chapel Hill
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-06-01 至 2021-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1562098&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Text

项目摘要

This project develops new technologies at the interface of computer vision and natural language processing to understand text-to-image relationships. For example, given a captioned image, the project develops techniques which determine which words (e.g. "woman talking on phone", "The farther vehicle") correspond to which image parts. From robotics to human-computer interaction, there are numerous real-world tasks that benefit from practical systems to identify objects in scenes based on language and understand language based on visual context. In particular, the project develops the first language-based image authoring tool which allows users to edit or synthesize realistic imagery using only natural language (e.g. "delete the garbage truck from this photo" or "make an image with three boys chasing a shaggy dog"). Beyond the immediate impact of creating new ways for users to access and author digital images, the broader impacts of this work include three focus areas: the development of new benchmarks for the vision and language communities, outreach and undergraduate research, and leadership in promoting diversity. At the core of the project are new techniques for large-scale text-to-image reference resolution (TIRR) that enable systems to automatically identify the image regions that depict entities described in natural language sentences or commands. These techniques advance image interpretation by enabling systems to perform partial matching between images and sentences, referring expression understanding, and image-based question answering. They also advance image manipulation by enabling systems that can synthesize images starting from a textual description, or modify images based on natural language commands. The main technical contributions of the project are: (1) benchmark datasets for TIRR with comprehensive large-scale gold standard annotations that will make TIRR a standard task for recognition; (2) principled new representations for text-to-image annotations that expose the compositional nature of language using the formalism of the denotation graph; (3) new models for TIRR that perform an explicit alignment (grounding) of words and phrases to image regions guided by the structure of the denotation graph; (4) applications of TIRR methods to referring expression understanding and visual question answering; and (5) applications of TIRR to image creation and manipulation based on natural language input.

该项目在计算机视觉和自然语言处理的界面上开发新技术，以理解文本到图像的关系。例如，给定一个标题图像，该项目开发的技术，确定哪些词（例如“女人打电话”，“更远的车辆”）对应于图像的哪些部分。从机器人技术到人机交互，有许多现实世界的任务受益于实用系统，可以根据语言识别场景中的对象，并根据视觉上下文理解语言。特别是，该项目开发了第一个基于语言的图像创作工具，允许用户仅使用自然语言编辑或合成逼真的图像（例如“从这张照片中删除垃圾车”或“制作一个三个男孩追逐毛茸茸的狗的图像”）。除了为用户创建访问和创作数字图像的新方法的直接影响外，这项工作的更广泛影响包括三个重点领域：为视觉和语言社区制定新的基准，外联和本科生研究，以及促进多样性的领导力。该项目的核心是大规模文本到图像参考分辨率（TIRR）的新技术，使系统能够自动识别描述自然语言句子或命令中描述的实体的图像区域。这些技术通过使系统能够执行图像和句子之间的部分匹配、引用表达理解和基于图像的问题回答来推进图像解释。它们还通过使系统能够从文本描述开始合成图像或基于自然语言命令修改图像来推进图像操作。该项目的主要技术贡献是：（1）TIRR的基准数据集，具有全面的大规模黄金标准注释，将使TIRR成为识别的标准任务;（2）文本到图像注释的原则性新表示，使用表示图的形式主义揭示语言的组成性质;（3）执行显式对齐的TIRR的新模型（4）TIRR方法在指称表达理解和视觉问答中的应用;以及（5）TIRR在基于自然语言输入的图像创建和处理中的应用。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Mohit Bansal其他文献

iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

iFacetSum：用于多文档探索的基于共指的交互式分面摘要

DOI：
10.18653/v1/2021.emnlp-demo.33
发表时间：
2021
期刊：
ArXiv
影响因子：
0
作者：
Eran Hirsch;Alon Eirew;Ori Shapira;Avi Caciularu;Arie Cattan;Ori Ernst;Ramakanth Pasunuru;H. Ronen;Mohit Bansal;Ido Dagan
通讯作者：
Ido Dagan

IMPLI : Investing NLI Models’ Performance on Figurative Language

IMPLI：投资 NLI 模型在比喻语言上的表现

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
J. Devlin;Ming;Kenton Lee;Aniruddha Ghosh;Guofu Li;Tony Veale;Paolo Rosso;Ekaterina Shutova;John Barnden;Hessel Haagsma;Johan Bos;Malvina Nissim;Adith Iyer;Aditya Joshi;Sarvnaz Karimi;Ross Sparks;George Lakoff;Mark Johnson. 1980. Metaphors;Yinhan Liu;Myle Ott;Naman Goyal;Jingfei Du;Mandar Joshi;Danqi Chen;Omer Levy;Mike Lewis;Rui Mao;Chenghua Lin;Frank Guerin;Tom McCoy;Ellie Pavlick;Tal Linzen;Saif M. Mohammad;Peter Tur;Yixin Nie;Yicheng Wang;Mohit Bansal;Adina Williams;Mohit Emily Dinan;Jason Bansal;Weston Douwe;Kiela. 2020
通讯作者：
Kiela. 2020