权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Visual Question Answering (VQA)

职业：视觉问答 (VQA)

基本信息

批准号：
1661374
负责人：
Devi Parikh
金额：
$ 51.7万
依托单位：
Georgia Tech Research Corporation
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-10-01 至 2022-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1661374&HistoricalAwards=false
关键词：
CAREER Visual Question Answering VQA

项目摘要

This project addresses the problem of Visual Question Answering (VQA). Given an image and a free-form natural language question about the image (e.g., "What kind of store is this?", "How many people are waiting in the queue?", "Is it safe to cross the street?"), the machine's task is to automatically produce a concise, accurate, free-form, natural language answer ("bakery", "5", "Yes"). VQA is directly applicable to a variety of applications of high societal impact that involve humans eliciting situationally-relevant information from visual data; where humans and machines must collaborate to extract information from pictures. Examples include aiding visually-impaired users in understanding their surroundings, analysts in making decisions based on large quantities of surveillance, and interacting with a robot. This project has the potential to fundamentally improve the way visually-impaired users live their daily lives, and revolutionize how society at large interacts with visual data. This research enables that VQA represents not a single narrowly-defined problem (e.g., image classification) but rather a rich spectrum of semantic scene understanding problems and associated research directions. Each question in VQA may lie at a different point on this spectrum: from questions that directly map to existing well-studied computer-vision problems ("What is this room called?" = indoor scene recognition) all the way to questions that require an integrated approach of vision (scene), language (semantics), and reasoning (understanding) over a knowledge base ("Does the pizza in the back row next to the bottle of Coke seem vegetarian?"). Consequently, this work maps to a sequence of waypoints along this spectrum. Motivated by addressing VQA from a variety of perspectives, this research program is generating new datasets, knowledge, and techniques in (i) pure computer vision (ii) integrating vision + language (iii) integrating vision + language + common sense (iv) building interpretable models and (v) combining a portfolio of methods. In addition, novel contributions are being made to (a) training the machine to be curious and actively ask questions to learn (b) using VQA as a modality to learn more about the visual world than what existing annotation modalities allow and (c) training the machine to know what it knows and what it does not.

这个项目解决了可视化问答（VQA）的问题。给定一个图像和一个关于图像的自由形式的自然语言问题（例如，“这是什么样的商店？”，“有多少人在排队？”，“过马路安全吗？”），机器的任务是自动生成简洁、准确、形式自由、自然语言的答案（“面包店”、“5”、“是”）。VQA直接适用于各种高社会影响的应用，涉及人类从视觉数据中提取情境相关信息；人类和机器必须合作从图片中提取信息。例子包括帮助视力受损的用户了解周围环境，分析师根据大量监控做出决策，以及与机器人互动。这个项目有可能从根本上改善视障用户的日常生活方式，并彻底改变整个社会与视觉数据的交互方式。这项研究使得VQA不是一个单一的狭义问题（例如，图像分类），而是一个丰富的语义场景理解问题和相关的研究方向。VQA中的每个问题可能位于这个范围的不同点：从直接映射到现有的计算机视觉问题的问题（“这个房间叫什么？”）（室内场景识别）一直到需要综合视觉（场景）、语言（语义）和推理（理解）的问题（“后排可乐旁边的披萨看起来像素食吗？”）。因此，这项工作映射到沿着这个光谱的一系列航路点。受从各种角度解决VQA的激励，该研究计划正在(i)纯计算机视觉（ii）集成视觉+语言（iii）集成视觉+语言+常识（iv）构建可解释的模型和(v)组合方法中生成新的数据集，知识和技术。此外，在以下方面也做出了新的贡献：(a)训练机器保持好奇心并主动提出学习问题；(b)使用VQA作为一种模式来学习更多关于视觉世界的知识，而不是现有的注释模式所允许的；(c)训练机器知道它知道什么，不知道什么。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Devi Parikh其他文献

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

用于识别整体场景理解瓶颈的人机 CRF

DOI：
发表时间：
2014
期刊：
ArXiv
影响因子：
0
作者：
Roozbeh Mottaghi;S. Fidler;A. Yuille;R. Urtasun;Devi Parikh
通讯作者：
Devi Parikh

Dialog System Technology Challenge 7

对话系统技术挑战赛7

DOI：
发表时间：
2019
期刊：
arXiv.org
影响因子：
0
作者：
Koichiro Yoshino;Chiori Hori;Julien Perez;L. F. D’Haro;L. Polymenakos;R. Chulaka Gunasekara;Walter S. Lasecki;Jonathan K. Kummerfeld;Michel Galley;Chris Brockett;Jianfeng Gao;W. Dolan;Xiang Gao;Huda AlAmri;Tim K. Marks;Devi Parikh;Dhruv Batra
通讯作者：
Dhruv Batra

Punny Captions: Witty Wordplay in Image Descriptions

双关语字幕：图像描述中的诙谐双关语

DOI：
10.18653/v1/n18-2121
发表时间：
2017
期刊：
Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
影响因子：
0
作者：
Arjun Chandrasekaran;Devi Parikh;Mohit Bansal
通讯作者：
Mohit Bansal

Knowing who to listen to: Prioritizing experts from a diverse ensemble for attribute personalization

知道该听谁的：优先考虑来自不同群体的专家以实现属性个性化

DOI：
10.1109/icip.2016.7533204
发表时间：
2016
期刊：
International Conference on Information Photonics
影响因子：
0
作者：
Shrenik Lad;Bernardino Romera;Julien P. C. Valentin;Philip H. S. Torr;Devi Parikh
通讯作者：
Devi Parikh

DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL

DS-VIC：强化学习中转移决策状态的无监督发现

DOI：
发表时间：
2019
期刊：
影响因子：
0
作者：
Nirbhay Modhe;Prithvijit Chattopadhyay;Mohit Sharma;Abhishek Das;Devi Parikh;Dhruv Batra;Ramakrishna Vedantam
通讯作者：
Ramakrishna Vedantam