RI: Small: Visual Reasoning and Self-questioning for Explainable Visual Question Answering

RI：小：视觉推理和自我质疑以实现可解释的视觉问答

基本信息

批准号：
2007613
负责人：
Ying Wu
金额：
$ 46.92万
依托单位：
Northwestern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2007613&HistoricalAwards=false
关键词：
RI Small Visual Reasoning Self

项目摘要

Visual question answering (VQA), aiming to answer a question in natural language related to a given image, is still in its infancy. Current approaches lack flexibility and generalizability to handling diverse questions without training. It is therefore desirable to explorep explainable VQA (or X-VQA) that can provide explanations of its reasoning in natural language in addition to answers. This requires integrating computer vision, natural language, and knowledge representation, and it is an incredibly challenging task. By exploring X-VQA this project advances and enriches the fundamental computer vision, image understanding, visual semantic analysis, machine learning, and knowledge representation. And it also greatly facilitates a wide range of applications including visual chatbots, visual retrieval and recommendation, and human-computer interaction. This research also contributes to education through curriculum development, student training, and knowledge dissemination. It includes interactions with K-12 students for participation and research opportunities. The major goal of this research is to develop a novel computational model with solid theoretical foundation and effective methods, to facilitate X-VQA that provides explanations of its visual reasoning. This challenging task involves many fundamental aspects and needs to integrate vision, language, learning and knowledge. This project focuses on: (1) A unified computational model of X-VQA and its theoretical foundation. This model integrates domain knowledge and visual observations for reasoning: what and how hidden facts can be inferred from incomplete and inaccurate visual observations; how visual observation, hidden facts, and domain knowledge can be represented for efficient question answering; and how the question answering can be scalable. The study of these critical issues creates the foundation for X-VQA; (2) A new model for question-driven task-oriented visual observation. It is inefficient to collect all visual observations before answering a question. Vision needs to be question-driven and task-oriented. This project pursues a new model for the interaction of questions, visual reasoning and visual observation, so as to automatically steer attention to the question-related aspects of an image; (3) An innovative approach to self-questioning for training X-VQA agents. Training simply based on question-answer data is not viable for X-VQA, as it is unable to provide explanations for and insights into the answer. This project pursues a novel approach to self-questioning, in which the VQA agents can also generate and ask questions. It investigates how self-questioning can be combined with reinforcement learning, and how it can deal with versatile questions to improve the scalability of X-VQA; and (4) A solid case study on X-VQA.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

视觉问题回答（VQA）旨在回答与给定图像有关的自然语言问题，但仍处于起步阶段。当前的方法缺乏在没有培训的情况下处理各种问题的灵活性和概括性。因此，需要探索可解释的VQA（或X-VQA），这些VQA（或X-VQA）除了答案外还可以解释其自然语言的推理。这需要整合计算机视觉，自然语言和知识表示，这是一项极具挑战性的任务。通过探索X-VQA，该项目可以发展并丰富了基本的计算机视觉，图像理解，视觉语义分析，机器学习和知识表示。而且它也极大地促进了广泛的应用，包括视觉聊天机器人，视觉检索和建议以及人类计算机的互动。这项研究还通过课程发展，学生培训和知识传播来促进教育。它包括与K-12学生进行参与和研究机会的互动。这项研究的主要目的是开发具有扎实的理论基础和有效方法的新型计算模型，以促进X-VQA，从而提供有关其视觉推理的解释。这项具有挑战性的任务涉及许多基本方面，并且需要整合视觉，语言，学习和知识。该项目的重点是：（1）X-VQA的统一计算模型及其理论基础。该模型集成了推理的领域知识和视觉观察：从不完整和不准确的视觉观察中推断出什么和如何隐藏的事实；视觉观察，隐藏的事实和领域知识如何以有效的问答表示；以及问题回答如何可扩展。对这些关键问题的研究为X-VQA奠定了基础。（2）针对问题驱动的任务视觉观察的新模型。在回答问题之前，收集所有视觉观察结果效率低下。愿景需要以质疑为导向并以任务为导向。该项目为问题的相互作用，视觉推理和视觉观察的相互作用追求一个新的模型，以便自动引导图像的与问题相关的方面。（3）一种自我询问的创新方法，用于培训X-VQA代理。仅基于问答数据的培训对于X-VQA而言是不可行的，因为它无法为答案提供解释和见解。该项目采用一种新颖的方法来进行自我询问，其中VQA代理商也可以产生并提出问题。它研究了如何将自我询问与强化学习结合在一起，以及如何处理多功能问题以提高X-VQA的可扩展性；（4）关于X-VQA的扎实案例研究。该奖项反映了NSF的法定使命，并被认为是值得通过基金会的知识分子优点和更广泛影响的评论标准来评估的。

项目成果

期刊论文数量（11）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Unsupervised Depth Completion and Denoising for RGB-D Sensors

RGB-D 传感器的无监督深度补全和去噪

DOI：
10.1109/icra46639.2022.9812392
发表时间：
2022
期刊：
2022 International Conference on Robotics and Automation (ICRA
影响因子：
0
作者：
Fan, Lei;Li, Yunxuan;Jiang, Chen;Wu, Ying
通讯作者：
Wu, Ying

Morphable Detector for Object Detection on Demand

DOI：
10.1109/iccv48922.2021.00473
发表时间：
2021-10
期刊：
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Xiangyun Zhao;Xu Zou;Ying Wu
通讯作者：
Xiangyun Zhao;Xu Zou;Ying Wu

Avoiding Lingering in Learning Active Recognition by Adversarial Disturbance

DOI：
10.1109/wacv56688.2023.00459
发表时间：
2023-01
期刊：
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
影响因子：
0
作者：
Lei Fan;Ying Wu
通讯作者：
Lei Fan;Ying Wu

Contrastive Learning for Label Efficient Semantic Segmentation

DOI：
10.1109/iccv48922.2021.01045
发表时间：
2020-12
期刊：
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Xiangyu Zhao;Raviteja Vemulapalli;P. A. Mansfield;Boqing Gong;Bradley Green;Lior Shapira;Ying Wu
通讯作者：
Xiangyu Zhao;Raviteja Vemulapalli;P. A. Mansfield;Boqing Gong;Bradley Green;Lior Shapira;Ying Wu

Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

DOI：
10.1109/wacv56688.2023.00597
发表时间：
2023-01
期刊：
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
影响因子：
0
作者：
Jianxiong Zhou;Ying Wu
通讯作者：
Jianxiong Zhou;Ying Wu

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ying Wu其他文献

Evaluation of Hepatocellular Carcinoma by Contrast‐Enhanced Sonography

超声造影评估肝细胞癌

DOI：
发表时间：
2011
期刊：
Journal of ultrasound in medicine
影响因子：
2.3
作者：
Jin Feng Xu;Hui Yu Liu;Yang Shi;Zhang Hong Wei;Ying Wu
通讯作者：
Ying Wu

Effects of APC De-Targeting and GAr Modification on the Duration of Luciferase Expression from Plasmid DNA Delivered to Skeletal Muscle

APC 脱靶和 GAr 修饰对递送至骨骼肌的质粒 DNA 荧光素酶表达持续时间的影响

DOI：
发表时间：
2014
期刊：
Current Gene Therapy
影响因子：
3.6
作者：
M. C. Subang;Rewas Fatah;Ying Wu;D. Hannaman;J. Rice;C. Evans;Y. Chernajovsky;D. Gould
通讯作者：
D. Gould

A novel thiazolidinedione derivative TD118 showing selective algicidal effects for red tide control

新型噻唑烷二酮衍生物 TD118 对赤潮控制具有选择性杀藻作用

DOI：
发表时间：
2014
期刊：
World Journal of Microbiology & Biotechnology
影响因子：
4.1
作者：
Ying Wu;Yew Lee;Seul;Minju Kim;C. Eom;S. Kim;Hoon Cho;E. Jin
通讯作者：
E. Jin

Qualitative Research on Dementia in Ethnically Diverse Communities

多种族社区痴呆症的定性研究

DOI：
发表时间：
2013
期刊：
American Journal of Alzheimers Disease & Other Dementias
影响因子：
3.4
作者：
C. Shanley;Desiree Leone;Y. Santalucia;Jon Adams;Jorge Enrique Ferrerosa;Fatima Kourouche;Silvana Gava;Ying Wu
通讯作者：
Ying Wu

The Orally Active Glutamate Carboxypeptidase II Inhibitor E2072 Exhibits Sustained Nerve Exposure and Attenuates Peripheral Neuropathy

口服活性谷氨酸羧肽酶 II 抑制剂 E2072 表现出持续的神经暴露并减轻周围神经病变

DOI：
发表时间：
2012
期刊：
Journal of Pharmacology and Experimental Therapeutics
影响因子：
3.5
作者：
K. Wozniak;Ying Wu;J. Vornov;R. Lapidus;R. Rais;C. Rojas;T. Tsukamoto;B. Slusher
通讯作者：
B. Slusher