权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Visual How: Task Understanding and Description in the Real World

RI：小：视觉方式：现实世界中的任务理解和描述

基本信息

批准号：
2143197
负责人：
Qi Zhao
金额：
$ 26.22万
依托单位：
University of Minnesota-Twin Cities
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-15 至 2025-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2143197&HistoricalAwards=false
关键词：
RI Small Visual How Task

项目摘要

Problem solving is an innate capability that humans develop through evolution and experience. Compared to human intelligence that can solve general and complex problems, current AI systems only perform well in narrow and structured tasks. With the overarching goal of bridging this gap, this project develops AI systems that can understand general real-world tasks (e.g., How to set up a tent? How to teach kids to garden? How to travel in London?) and come up with solutions with step-by-step language and visual guidance. It will allow for real-world tasks to be solved even in general and complex circumstances, resulting in more human-like AI. Ultimately, the project will take a step forward toward artificial general intelligence. The project will provide a publicly available dataset, a framework of computational models, and a mobile application prototype. Furthermore, this project will support integrated research and education with a focus on increasing minority participation through K-12 outreach, underrepresented and undergraduate mentoring, and curriculum development.This project proposes a VisualHow problem that represents a rich spectrum of real-world tasks. The generality and complexity of the problem call for capabilities to understand the visual and textual contents of the task, reason with knowledge relevant to the task, and generate step-by-step multimodal descriptions about how the task can be completed. This project aims to achieve these goals in three tasks. First, generate a new dataset with diverse and real-world tasks and solutions, with rich annotations of key semantics and task structures to guide the multimodal attention and structural reasoning. Second, develop a novel framework in which a series of models are derived for explainable VisualHow learning to understand the visual-textual contents and generate steps to complete real-world tasks. Third, develop novel methods to generalize the models with knowledge and validate them on mobile platforms to assist people in real-world applications. Achieving these goals will not only lead to new vision-language tasks and computational methods for real-world problem solving, but also spur innovations in the development of explainable and generalizable AI models and systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

解决问题是人类通过进化和经验而发展的一种与生俱来的能力。与能够解决一般和复杂问题的人类智能相比，目前的人工智能系统只能在狭窄和结构化的任务中表现良好。该项目的首要目标是弥合这一差距，开发能够理解一般现实世界任务的人工智能系统（例如，如何搭建帐篷？如何教孩子们园艺？如何在伦敦旅行？）并通过逐步的语言和视觉指导提出解决方案。它将允许在一般和复杂的情况下解决现实世界的任务，从而产生更像人类的人工智能。最终，该项目将朝着人工通用智能迈出一步。该项目将提供一个公开的数据集，计算模型的框架，和一个移动的应用原型。此外，该项目将支持综合研究和教育，重点是通过K-12外展、代表性不足和本科生指导以及课程开发来增加少数族裔的参与。该项目提出了一个代表丰富现实任务的VisualHow问题。问题的一般性和复杂性要求能够理解任务的视觉和文本内容，推理与任务相关的知识，并生成关于如何完成任务的逐步多模态描述。本项目旨在通过三项任务实现这些目标。首先，生成一个新的数据集，其中包含各种真实世界的任务和解决方案，并具有关键语义和任务结构的丰富注释，以指导多模态注意力和结构推理。第二，开发一个新的框架，在该框架中导出了一系列模型，用于可解释的VisualHow学习，以理解视觉文本内容并生成完成现实世界任务的步骤。第三，开发新的方法来概括模型的知识，并在移动的平台上验证它们，以帮助人们在现实世界中的应用。实现这些目标不仅将为解决现实世界问题带来新的视觉语言任务和计算方法，还将推动可解释和可推广的人工智能模型和系统的开发创新。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

DOI：
10.48550/arxiv.2303.10482
发表时间：
2023-03
期刊：
ArXiv
影响因子：
0
作者：
Shi Chen;Qi Zhao
通讯作者：
Shi Chen;Qi Zhao

VisualHow: Multimodal Problem Solving

DOI：
10.1109/cvpr52688.2022.01518
发表时间：
2022-06
期刊：
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
影响因子：
0
作者：
Jinhui Yang;Xianyu Chen;Ming Jiang;Shi Chen;Louis Wang;Qi Zhao
通讯作者：
Jinhui Yang;Xianyu Chen;Ming Jiang;Shi Chen;Louis Wang;Qi Zhao

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Qi Zhao其他文献

Recombinant-fully-human-antibody decorated highly-stable far-red AIEdots for in vivo HER-2 receptor-targeted imaging

重组全人抗体修饰高度稳定的远红 AIEdot，用于体内 HER-2 受体靶向成像

DOI：
10.1039/c8cc03037e
发表时间：
2018
期刊：
Chemical Communications
影响因子：
4.9
作者：
Yayun Wu;Zhizhen Chen;Pengfei Zhang;Lihua Zhou;Tao Jiang;Huajie Chen;Ping Gong;Dimiter S. Dimitrov;Lintao Cai;Qi Zhao
通讯作者：
Qi Zhao

Fate and reactions of methane during biodegradation in an aquifer contaminated with petroleum hydrocarbons in Northeast China

中国东北地区石油烃污染含水层中甲烷生物降解过程的归宿和反应

DOI：
10.2343/geochemj.2.0400
发表时间：
2016
期刊：
Geochemical Journal
影响因子：
0.8
作者：
X. Su;Ende Zuo;Hang Lv;Qi Zhao;Pucheng Zhu;G. Lin;Mingyao Liu
通讯作者：
Mingyao Liu

An Investigation of the Uncertainty of Handbook of Emission Factors for Road Transport (HBEFA) for Estimating Greenhouse Gas Emissions: A Case Study in Beijing

用于估算温室气体排放的道路运输排放因子手册（HBEFA）的不确定度调查：以北京为例

DOI：
10.1177/0361198118796710
发表时间：
2018-09
期刊：
Transportation Research Record
影响因子：
1.7
作者：
Hongyu Lu;Guohua Song;Qi Zhao;Jingyi Wang;Weinan He;Lei Yu
通讯作者：
Lei Yu

An Improved Adaptive Kalman Filter for Altitude Estimation of Quadrotors

四旋翼飞行器高度估计的改进自适应卡尔曼滤波器

DOI：
10.23919/chicc.2019.8866453
发表时间：
2019
期刊：
2019 Chinese Control Conference (CCC)
影响因子：
0
作者：
Qi Zhao;Fenghua He;Ning Hao;Rui Xing
通讯作者：
Rui Xing

A sequence-based generalization of mean-field annealing using the Forward/Backward algorithm: Application to image segmentation

使用前向/后向算法的基于序列的平均场退火推广：在图像分割中的应用

DOI：
10.1109/icassp.2002.5743955
发表时间：
2002
期刊：
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
影响因子：
0
作者：
David J. Miller;P. Bunyaratavej;Qi Zhao
通讯作者：
Qi Zhao