CAREER: Structure-Preserving Multimodal Alignment between Vision and Language
职业:视觉和语言之间保持结构的多模态对齐
基本信息
- 批准号:2239840
- 负责人:
- 金额:$ 56.3万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-01 至 2028-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
A grand challenge in artificial intelligence (AI) is to be able to process multimodal vision and language data, while preserving relationships across such modalities so that the linkages between the different modalities is sustained. Current machine learning systems do not fully grasp the structures and relationships that exist within human vision and language, and thus have difficulties producing the desired outcomes in terms of interpretability, efficiency, measurability, and causality. This project tackles the fundamental multimodal alignment problem in machine learning and will advance research in both computer vision and natural language processing, especially in the disruptive innovation areas of multimodal vision-language generation and understanding. It will lead to breakthroughs in both theoretical understanding as well as practical applications of vision and language. The techniques developed under this project could similarly be used to connect different types of latent structures across modalities and are not limited to vision and language. This would be extremely beneficial for responsible AI applications in the sciences, where people not only want to understand the relationship in data, but the structure and causal explanations. Such an understanding is also critical for reducing demographic biases that machine learning models exhibit. Through education, open-sourcing and outreach activities, this project will train and educate students of all levels - from K-12 to graduate - in AI, advance theoretical vision and language courses, reduce bias, and further democratize AI.Preserving structure is an essential component of understanding how to make machine learning models better and more reliable. This project aims to create novel and significant scientific advances in multimodal vision and language modeling with structure-preserving latent space alignment to build a bridge between vision and language. The project aims to increase the structural preserving nature for linguistic and visual embeddings and develop a map between the two latent representations that preserves the underlying structures. In particular, the project will achieve these goals through four thrusts: (I) Developing structure-preserving latent representations and mapping between vision and language; (II) Improving learning and data efficiency through latent structures; (III) Develop novel evaluation metrics through structural information to improve measurability; (IV) Develop a causal representation and interpretation framework.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人工智能(AI)的一个重大挑战是能够处理多模式视觉和语言数据,同时保持这些模式之间的关系,以便保持不同模式之间的联系。目前的机器学习系统没有完全掌握人类视觉和语言中存在的结构和关系,因此在可解释性、效率、可测性和因果关系方面难以产生期望的结果。该项目解决了机器学习中基本的多模式对齐问题,并将促进计算机视觉和自然语言处理方面的研究,特别是在多模式视觉的颠覆性创新领域--语言生成和理解。它将在视觉和语言的理论理解和实际应用方面取得突破。在该项目下开发的技术同样可以用于将不同类型的潜在结构连接起来,而不限于视觉和语言。这将对负责任的人工智能在科学中的应用极其有益,在科学中,人们不仅想要了解数据中的关系,而且想要了解结构和因果解释。这样的理解对于减少机器学习模型所表现出的人口统计学偏差也至关重要。通过教育、开源和推广活动,该项目将对从K-12到研究生的所有级别的学生进行人工智能培训和教育,推进理论视野和语言课程,减少偏见,并进一步民主化人工智能。保存结构是理解如何使机器学习模型更好和更可靠的重要组成部分。这个项目的目的是在多通道视觉和语言建模方面创造新的和有意义的科学进步,并保持结构保持潜在空间对齐,在视觉和语言之间架起一座桥梁。该项目旨在增加语言和视觉嵌入的结构保存性,并在两个潜在的表征之间开发一种地图,以保存潜在的结构。特别是,该项目将通过四项努力实现这些目标:(I)开发保持结构的潜在表征和视觉与语言之间的映射;(Ii)通过潜在结构改善学习和数据效率;(Iii)通过结构信息开发新的评估指标以提高可测量性;(Iv)开发因果表征和解释框架。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
- DOI:10.1109/iccv51070.2023.00713
- 发表时间:2022-11
- 期刊:
- 影响因子:0
- 作者:Xingqian Xu;Zhangyang Wang;Eric Zhang;Kai Wang;Humphrey Shi
- 通讯作者:Xingqian Xu;Zhangyang Wang;Eric Zhang;Kai Wang;Humphrey Shi
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
- DOI:10.1109/iccv51070.2023.01462
- 发表时间:2023-03
- 期刊:
- 影响因子:0
- 作者:Levon Khachatryan;A. Movsisyan;Vahram Tadevosyan;Roberto Henschel;Zhangyang Wang;Shant Navasardyan;Humphrey Shi
- 通讯作者:Levon Khachatryan;A. Movsisyan;Vahram Tadevosyan;Roberto Henschel;Zhangyang Wang;Shant Navasardyan;Humphrey Shi
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Humphrey Shi其他文献
A Novel Framework for 3D-2D Vertebra Matching
3D-2D 椎骨匹配的新框架
- DOI:
10.1109/mipr.2019.00029 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Hanchao Yu;Yang Fu;Haichao Yu;Yunchao Wei;Xinchao Wang;Jianbo Jiao;Matthew Bramler;T. Kesavadas;Humphrey Shi;Zhangyang Wang;B. Wen;Thomas S. Huang - 通讯作者:
Thomas S. Huang
Capitalist Potatoes
资本主义土豆
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Jiayi Guo;Hayk Manukyan;Chenyu Yang;Chaofei Wang;Levon Khachatryan;Shant Navasardyan;Shiji Song;Humphrey Shi;Gao Huang - 通讯作者:
Gao Huang
Appendix for SeMask: Semantically Masked Transformers for Semantic Segmentation
SeMask 附录:用于语义分割的语义屏蔽变压器
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Jitesh Jain;Anukriti Singh;Nikita Orlov;Zilong Huang;Jiachen Li;Steven Walton;Humphrey Shi - 通讯作者:
Humphrey Shi
Appendix for OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer 附录:一个统治通用图像分割的 Transformer
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Jitesh Jain;Jiacheng Li;M. Chiu;Ali Hassani;Nikita Orlov;Humphrey Shi - 通讯作者:
Humphrey Shi
Geometry-Aware Traffic Flow Analysis by Detection and Tracking
- DOI:
10.1109/cvprw.2018.00023 - 发表时间:
2018-06 - 期刊:
- 影响因子:0
- 作者:
Humphrey Shi - 通讯作者:
Humphrey Shi
Humphrey Shi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Design and Analysis of Structure Preserving Discretizations to Simulate Pattern Formation in Liquid Crystals and Ferrofluids
模拟液晶和铁磁流体中图案形成的结构保持离散化的设计和分析
- 批准号:
2409989 - 财政年份:2024
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant
Structure-Preserving Integrators for Lévy-Driven Stochastic Systems
Levy 驱动随机系统的结构保持积分器
- 批准号:
EP/Y033248/1 - 财政年份:2024
- 资助金额:
$ 56.3万 - 项目类别:
Research Grant
Structure theory for measure-preserving systems, additive combinatorics, and correlations of multiplicative functions
保测系统的结构理论、加法组合学和乘法函数的相关性
- 批准号:
2347850 - 财政年份:2024
- 资助金额:
$ 56.3万 - 项目类别:
Continuing Grant
Collaborative Research: Accurate and Structure-Preserving Numerical Schemes for Variable Temperature Phase Field Models and Efficient Solvers
合作研究:用于变温相场模型和高效求解器的精确且结构保持的数值方案
- 批准号:
2309547 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant
Nonlinear logarithmic difference operators and their application to structure-preserving numerical methods
非线性对数差分算子及其在保结构数值方法中的应用
- 批准号:
23K17655 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Structure-Preserving Finite Element Methods for Incompressible Flow on Smooth Domains and Surfaces
光滑域和表面上不可压缩流动的保结构有限元方法
- 批准号:
2309425 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant
Collaborative Research: Arbitrary Order Structure-Preserving Discontinuous Galerkin Methods for Compressible Euler Equations With Self-Gravity in Astrophysical Flows
合作研究:天体物理流中自重力可压缩欧拉方程的任意阶结构保持不连续伽辽金方法
- 批准号:
2309591 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant
Expressivity of Structure-Preserving Deep Neural Networks for the Space-Time Approximation of High-Dimensional Nonlinear Partial Differential Equations with Boundaries
保结构深度神经网络的表达能力用于高维非线性有边界偏微分方程的时空逼近
- 批准号:
2318032 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Continuing Grant
Collaborative Research: Accurate and Structure-Preserving Numerical Schemes for Variable Temperature Phase Field Models and Efficient Solvers
合作研究:用于变温相场模型和高效求解器的精确且结构保持的数值方案
- 批准号:
2309548 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant
Structure-preserving machine learning moment closures for kinetic equations
动力学方程的结构保持机器学习矩闭包
- 批准号:
2309655 - 财政年份:2023
- 资助金额:
$ 56.3万 - 项目类别:
Standard Grant