CAREER: Visual Question Answering (VQA)

职业:视觉问答 (VQA)

基本信息

  • 批准号:
    1661374
  • 负责人:
  • 金额:
    $ 51.7万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-10-01 至 2022-07-31
  • 项目状态:
    已结题

项目摘要

This project addresses the problem of Visual Question Answering (VQA). Given an image and a free-form natural language question about the image (e.g., "What kind of store is this?", "How many people are waiting in the queue?", "Is it safe to cross the street?"), the machine's task is to automatically produce a concise, accurate, free-form, natural language answer ("bakery", "5", "Yes"). VQA is directly applicable to a variety of applications of high societal impact that involve humans eliciting situationally-relevant information from visual data; where humans and machines must collaborate to extract information from pictures. Examples include aiding visually-impaired users in understanding their surroundings, analysts in making decisions based on large quantities of surveillance, and interacting with a robot. This project has the potential to fundamentally improve the way visually-impaired users live their daily lives, and revolutionize how society at large interacts with visual data. This research enables that VQA represents not a single narrowly-defined problem (e.g., image classification) but rather a rich spectrum of semantic scene understanding problems and associated research directions. Each question in VQA may lie at a different point on this spectrum: from questions that directly map to existing well-studied computer-vision problems ("What is this room called?" = indoor scene recognition) all the way to questions that require an integrated approach of vision (scene), language (semantics), and reasoning (understanding) over a knowledge base ("Does the pizza in the back row next to the bottle of Coke seem vegetarian?"). Consequently, this work maps to a sequence of waypoints along this spectrum. Motivated by addressing VQA from a variety of perspectives, this research program is generating new datasets, knowledge, and techniques in (i) pure computer vision (ii) integrating vision + language (iii) integrating vision + language + common sense (iv) building interpretable models and (v) combining a portfolio of methods. In addition, novel contributions are being made to (a) training the machine to be curious and actively ask questions to learn (b) using VQA as a modality to learn more about the visual world than what existing annotation modalities allow and (c) training the machine to know what it knows and what it does not.
这个项目解决了可视化问答(VQA)的问题。给定一个图像和一个关于图像的自由形式的自然语言问题(例如,“这是什么样的商店?”,“有多少人在排队?”,“过马路安全吗?”),机器的任务是自动生成简洁、准确、形式自由、自然语言的答案(“面包店”、“5”、“是”)。VQA直接适用于各种高社会影响的应用,涉及人类从视觉数据中提取情境相关信息;人类和机器必须合作从图片中提取信息。例子包括帮助视力受损的用户了解周围环境,分析师根据大量监控做出决策,以及与机器人互动。这个项目有可能从根本上改善视障用户的日常生活方式,并彻底改变整个社会与视觉数据的交互方式。这项研究使得VQA不是一个单一的狭义问题(例如,图像分类),而是一个丰富的语义场景理解问题和相关的研究方向。VQA中的每个问题可能位于这个范围的不同点:从直接映射到现有的计算机视觉问题的问题(“这个房间叫什么?”)(室内场景识别)一直到需要综合视觉(场景)、语言(语义)和推理(理解)的问题(“后排可乐旁边的披萨看起来像素食吗?”)。因此,这项工作映射到沿着这个光谱的一系列航路点。受从各种角度解决VQA的激励,该研究计划正在(i)纯计算机视觉(ii)集成视觉+语言(iii)集成视觉+语言+常识(iv)构建可解释的模型和(v)组合方法中生成新的数据集,知识和技术。此外,在以下方面也做出了新的贡献:(a)训练机器保持好奇心并主动提出学习问题;(b)使用VQA作为一种模式来学习更多关于视觉世界的知识,而不是现有的注释模式所允许的;(c)训练机器知道它知道什么,不知道什么。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Devi Parikh其他文献

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
用于识别整体场景理解瓶颈的人机 CRF
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Roozbeh Mottaghi;S. Fidler;A. Yuille;R. Urtasun;Devi Parikh
  • 通讯作者:
    Devi Parikh
Dialog System Technology Challenge 7
对话系统技术挑战赛7
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Koichiro Yoshino;Chiori Hori;Julien Perez;L. F. D’Haro;L. Polymenakos;R. Chulaka Gunasekara;Walter S. Lasecki;Jonathan K. Kummerfeld;Michel Galley;Chris Brockett;Jianfeng Gao;W. Dolan;Xiang Gao;Huda AlAmri;Tim K. Marks;Devi Parikh;Dhruv Batra
  • 通讯作者:
    Dhruv Batra
Punny Captions: Witty Wordplay in Image Descriptions
双关语字幕:图像描述中的诙谐双关语
Knowing who to listen to: Prioritizing experts from a diverse ensemble for attribute personalization
知道该听谁的:优先考虑来自不同群体的专家以实现属性个性化
DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL
DS-VIC:强化学习中转移决策状态的无监督发现
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nirbhay Modhe;Prithvijit Chattopadhyay;Mohit Sharma;Abhishek Das;Devi Parikh;Dhruv Batra;Ramakrishna Vedantam
  • 通讯作者:
    Ramakrishna Vedantam

Devi Parikh的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Devi Parikh', 18)}}的其他基金

CAREER: Visual Question Answering (VQA)
职业:视觉问答 (VQA)
  • 批准号:
    1552377
  • 财政年份:
    2016
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Continuing Grant
RI: Small: Debugging Machine Visual Recognition via Humans in the Loop
RI:小型:通过人在循环中调试机器视觉识别
  • 批准号:
    1341772
  • 财政年份:
    2013
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Standard Grant
RI: Small: Debugging Machine Visual Recognition via Humans in the Loop
RI:小型:通过人在循环中调试机器视觉识别
  • 批准号:
    1115719
  • 财政年份:
    2011
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Standard Grant

相似国自然基金

基于多幅图象的Visual Hull重构及表面属性建模算法研究
  • 批准号:
    60373031
  • 批准年份:
    2003
  • 资助金额:
    23.0 万元
  • 项目类别:
    面上项目

相似海外基金

Compositional Generalization in open-domain Visual Question Answering: A New Direction using Multimodal Grounded Representations using Graph Neural Networks
开放域视觉问答中的组合概括:使用图神经网络的多模态接地表示的新方向
  • 批准号:
    559027-2021
  • 财政年份:
    2022
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Compositional Generalization in open-domain Visual Question Answering: A New Direction using Multimodal Grounded Representations using Graph Neural Networks
开放域视觉问答中的组合概括:使用图神经网络的多模态接地表示的新方向
  • 批准号:
    559027-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
RI: Small: Visual Reasoning and Self-questioning for Explainable Visual Question Answering
RI:小:视觉推理和自我质疑以实现可解释的视觉问答
  • 批准号:
    2007613
  • 财政年份:
    2020
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Standard Grant
CRII: CHS: Predicting When, Why, and How Multiple People Will Disagree when Answering a Visual Question
CRII:CHS:预测多人在回答视觉问题时何时、为何以及如何产生分歧
  • 批准号:
    1755593
  • 财政年份:
    2018
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Standard Grant
Visual Question Answering focused on properties (materials, shapes) and relations of small objects
视觉问答侧重于小物体的属性(材料、形状)和关系
  • 批准号:
    2127907
  • 财政年份:
    2018
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Studentship
RI: Small: A Cognitive Framework for Technical, Hard and Explainable Question Answering (THE-QA) with respect to Combined Textual and Visual Inputs
RI:小:结合文本和视觉输入的技术性、硬性和可解释性问答 (THE-QA) 的认知框架
  • 批准号:
    1816039
  • 财政年份:
    2018
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Standard Grant
Visual Question Answering System with a Knowledge Base
具有知识库的视觉问答系统
  • 批准号:
    18H03264
  • 财政年份:
    2018
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Deep Learning for Visual Question Answering
视觉问答的深度学习
  • 批准号:
    512089-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 51.7万
  • 项目类别:
    University Undergraduate Student Research Awards
CAREER: Visual Question Answering (VQA)
职业:视觉问答 (VQA)
  • 批准号:
    1552377
  • 财政年份:
    2016
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Continuing Grant
Research on the question of visibility of visual image works under the digital environment
数字环境下视觉图像作品的可视性问题研究
  • 批准号:
    24520192
  • 财政年份:
    2012
  • 资助金额:
    $ 51.7万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了