权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Structure-Preserving Multimodal Alignment between Vision and Language

职业：视觉和语言之间保持结构的多模态对齐

基本信息

批准号：
2239840
负责人：
Humphrey Shi
金额：
$ 56.3万
依托单位：
University of Oregon Eugene
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-07-01 至 2028-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2239840&HistoricalAwards=false
关键词：
CAREER Structure Preserving Multimodal Alignment

项目摘要

A grand challenge in artificial intelligence (AI) is to be able to process multimodal vision and language data, while preserving relationships across such modalities so that the linkages between the different modalities is sustained. Current machine learning systems do not fully grasp the structures and relationships that exist within human vision and language, and thus have difficulties producing the desired outcomes in terms of interpretability, efficiency, measurability, and causality. This project tackles the fundamental multimodal alignment problem in machine learning and will advance research in both computer vision and natural language processing, especially in the disruptive innovation areas of multimodal vision-language generation and understanding. It will lead to breakthroughs in both theoretical understanding as well as practical applications of vision and language. The techniques developed under this project could similarly be used to connect different types of latent structures across modalities and are not limited to vision and language. This would be extremely beneficial for responsible AI applications in the sciences, where people not only want to understand the relationship in data, but the structure and causal explanations. Such an understanding is also critical for reducing demographic biases that machine learning models exhibit. Through education, open-sourcing and outreach activities, this project will train and educate students of all levels - from K-12 to graduate - in AI, advance theoretical vision and language courses, reduce bias, and further democratize AI.Preserving structure is an essential component of understanding how to make machine learning models better and more reliable. This project aims to create novel and signiﬁcant scientific advances in multimodal vision and language modeling with structure-preserving latent space alignment to build a bridge between vision and language. The project aims to increase the structural preserving nature for linguistic and visual embeddings and develop a map between the two latent representations that preserves the underlying structures. In particular, the project will achieve these goals through four thrusts: (I) Developing structure-preserving latent representations and mapping between vision and language; (II) Improving learning and data efficiency through latent structures; (III) Develop novel evaluation metrics through structural information to improve measurability; (IV) Develop a causal representation and interpretation framework.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人工智能(AI)的一个重大挑战是能够处理多模式视觉和语言数据，同时保持这些模式之间的关系，以便保持不同模式之间的联系。目前的机器学习系统没有完全掌握人类视觉和语言中存在的结构和关系，因此在可解释性、效率、可测性和因果关系方面难以产生期望的结果。该项目解决了机器学习中基本的多模式对齐问题，并将促进计算机视觉和自然语言处理方面的研究，特别是在多模式视觉的颠覆性创新领域--语言生成和理解。它将在视觉和语言的理论理解和实际应用方面取得突破。在该项目下开发的技术同样可以用于将不同类型的潜在结构连接起来，而不限于视觉和语言。这将对负责任的人工智能在科学中的应用极其有益，在科学中，人们不仅想要了解数据中的关系，而且想要了解结构和因果解释。这样的理解对于减少机器学习模型所表现出的人口统计学偏差也至关重要。通过教育、开源和推广活动，该项目将对从K-12到研究生的所有级别的学生进行人工智能培训和教育，推进理论视野和语言课程，减少偏见，并进一步民主化人工智能。保存结构是理解如何使机器学习模型更好和更可靠的重要组成部分。这个项目的目的是在多通道视觉和语言建模方面创造新的和有意义的科学进步，并保持结构保持潜在空间对齐，在视觉和语言之间架起一座桥梁。该项目旨在增加语言和视觉嵌入的结构保存性，并在两个潜在的表征之间开发一种地图，以保存潜在的结构。特别是，该项目将通过四项努力实现这些目标：(I)开发保持结构的潜在表征和视觉与语言之间的映射；(Ii)通过潜在结构改善学习和数据效率；(Iii)通过结构信息开发新的评估指标以提高可测量性；(Iv)开发因果表征和解释框架。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

DOI：
10.1109/iccv51070.2023.00713
发表时间：
2022-11
期刊：
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Xingqian Xu;Zhangyang Wang;Eric Zhang;Kai Wang;Humphrey Shi
通讯作者：
Xingqian Xu;Zhangyang Wang;Eric Zhang;Kai Wang;Humphrey Shi

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

DOI：
10.1109/iccv51070.2023.01462
发表时间：
2023-03
期刊：
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Levon Khachatryan;A. Movsisyan;Vahram Tadevosyan;Roberto Henschel;Zhangyang Wang;Shant Navasardyan;Humphrey Shi
通讯作者：
Levon Khachatryan;A. Movsisyan;Vahram Tadevosyan;Roberto Henschel;Zhangyang Wang;Shant Navasardyan;Humphrey Shi

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Humphrey Shi其他文献

A Novel Framework for 3D-2D Vertebra Matching

3D-2D 椎骨匹配的新框架

DOI：
10.1109/mipr.2019.00029
发表时间：
2019
期刊：
2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
影响因子：
0
作者：
Hanchao Yu;Yang Fu;Haichao Yu;Yunchao Wei;Xinchao Wang;Jianbo Jiao;Matthew Bramler;T. Kesavadas;Humphrey Shi;Zhangyang Wang;B. Wen;Thomas S. Huang
通讯作者：
Thomas S. Huang

Capitalist Potatoes

资本主义土豆

DOI：
发表时间：
2020
期刊：
Feeding the People
影响因子：
0
作者：
Jiayi Guo;Hayk Manukyan;Chenyu Yang;Chaofei Wang;Levon Khachatryan;Shant Navasardyan;Shiji Song;Humphrey Shi;Gao Huang
通讯作者：
Gao Huang

Appendix for SeMask: Semantically Masked Transformers for Semantic Segmentation

SeMask 附录：用于语义分割的语义屏蔽变压器

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Jitesh Jain;Anukriti Singh;Nikita Orlov;Zilong Huang;Jiachen Li;Steven Walton;Humphrey Shi
通讯作者：
Humphrey Shi

Appendix for OneFormer: One Transformer to Rule Universal Image Segmentation

OneFormer 附录：一个统治通用图像分割的 Transformer

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
Jitesh Jain;Jiacheng Li;M. Chiu;Ali Hassani;Nikita Orlov;Humphrey Shi
通讯作者：
Humphrey Shi

Geometry-Aware Traffic Flow Analysis by Detection and Tracking

DOI：
10.1109/cvprw.2018.00023
发表时间：
2018-06
期刊：
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
影响因子：
0
作者：
Humphrey Shi
通讯作者：
Humphrey Shi