权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Improving grounding, generalization and contextual reasoning in vision and language models

RI：中：改善视觉和语言模型中的基础、泛化和上下文推理

基本信息

批准号：
2107048
负责人：
Olga Russakovsky
金额：
$ 120万
依托单位：
Princeton University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-01 至 2025-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107048&HistoricalAwards=false
关键词：
RI Medium Improving grounding generalization

项目摘要

Recent Artificial Intelligence (AI) advances have brought us closer to the possibility of important and exciting real-world applications: ranging from robot assistants for the elderly or differently-abled, to large-scale video analysis of footage from police body-worn cameras to examine police-civilian interactions. Such applications require AI models to understand both visual and natural language cues. However, the state of vision-and-language technology is still not quite ready for these scenarios. Current visual recognition models appear to recognize many different objects but lack an understanding of the interconnection and structure of the visual world. Current image captioning systems output reasonable but completely generic image descriptions. Modern visual question answering systems are not robust to simple changes like synonyms or word rearrangements. This research will lead to fundamental advances in visual recognition and natural language understanding, laying the groundwork for more effective human-machine collaboration. The goal of this research is to move towards a tighter, more accurate and contextual integration of visual recognition and natural language processing. This involves addressing three key challenges: (1) enabling accurate and scalable grounding by establishing robust bi-directional connections between visual input and natural language tokens; (2) improving generalization of vision-and-language models to novel concepts and tasks; and (3) enabling contextual reasoning to allow models to effectively adapt to human or task-specific needs. The unifying theme is that all three challenges require innovation in not only modeling but also in reliable and insightful benchmarking: current evaluation frameworks are insufficient to drive progress in this space. The roadmap is to redesign existing benchmarks and evaluation paradigms, use the newly formulated metrics to identify the shortcomings in existing systems, and rely on these insights to drive the deep learning modeling innovations. This research uses the team’s expertise in designing multi-modal models for vision and language as well as in constructing effective large-scale benchmarks. The findings will be disseminated through technical workshops, open access publications, and open-source code. They will also be integrated into undergraduate, graduate and K-12 curriculum through collaboration with foundations like AI4ALL.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

最近人工智能（AI）的进步使我们更接近重要和令人兴奋的现实世界应用的可能性：从老年人或不同能力的机器人助手，到对警用随身摄像机的镜头进行大规模视频分析，以检查警民互动。这些应用程序需要AI模型来理解视觉和自然语言提示。然而，视觉和语言技术的状态仍然没有完全准备好这些场景。目前的视觉识别模型似乎可以识别许多不同的对象，但缺乏对视觉世界的互连和结构的理解。当前的图像字幕系统输出合理但完全通用的图像描述。现代视觉问答系统对同义词或单词重排等简单变化并不鲁棒。这项研究将导致视觉识别和自然语言理解的根本性进步，为更有效的人机协作奠定基础。这项研究的目标是朝着视觉识别和自然语言处理的更紧密，更准确和上下文集成的方向发展。这涉及解决三个关键挑战：（1）通过在视觉输入和自然语言标记之间建立强大的双向连接来实现准确和可扩展的基础;（2）改进视觉和语言模型对新概念和任务的泛化;（3）启用上下文推理，使模型能够有效地适应人类或特定任务的需求。统一的主题是，所有三个挑战不仅需要在建模方面创新，而且需要在可靠和有见地的基准方面创新：目前的评估框架不足以推动这一领域的进展。路线图是重新设计现有的基准和评估范式，使用新制定的指标来识别现有系统中的缺点，并依靠这些见解来推动深度学习建模创新。这项研究利用了团队在设计视觉和语言的多模态模型以及构建有效的大规模基准方面的专业知识。研究结果将通过技术研讨会、开放获取出版物和开源代码传播。通过与AI 4ALL等基金会的合作，它们还将被整合到本科生、研究生和K-12课程中。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量（5）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

DOI：
10.48550/arxiv.2206.02916
发表时间：
2022-06
期刊：
ArXiv
影响因子：
0
作者：
Zhiwei Deng-;Olga Russakovsky
通讯作者：
Zhiwei Deng-;Olga Russakovsky

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

DOI：
10.48550/arxiv.2207.01206
发表时间：
2022-07
期刊：
ArXiv
影响因子：
0
作者：
Shunyu Yao;Howard Chen;John Yang;Karthik Narasimhan
通讯作者：
Shunyu Yao;Howard Chen;John Yang;Karthik Narasimhan

Multi-query Video Retrieval

DOI：
10.1007/978-3-031-19781-9_14
发表时间：
2022-01
期刊：
影响因子：
0
作者：
Zeyu Wang;Yu Wu;Karthik Narasimhan;Olga Russakovsky
通讯作者：
Zeyu Wang;Yu Wu;Karthik Narasimhan;Olga Russakovsky

ReAct: Synergizing Reasoning and Acting in Language Models

ReAct：在语言模型中协同推理和行动

DOI：
发表时间：
2023
期刊：
International Conference on Learning Representations (ICLR
影响因子：
0
作者：
Yao, Shunyu;Zhao, Jeffrey;Yu, Dian;Du, Nan;Shafran, Izhak;Narasimhan, Karthik;Cao, Yuan
通讯作者：
Cao, Yuan

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Olga Russakovsky其他文献

Best of both worlds: human-machine collaboration for object annotation (preliminary version)

两全其美：人机协作进行对象标注（初步版本）

DOI：
发表时间：
2015
期刊：
影响因子：
0
作者：
Olga Russakovsky
通讯作者：
Olga Russakovsky

Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation

走风景路线：提高视觉和语言导航的泛化能力

DOI：
发表时间：
2020
期刊：
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
影响因子：
0
作者：
Felix Yu;Zhiwei Deng;Karthik Narasimhan;Olga Russakovsky
通讯作者：
Olga Russakovsky

C ORRESPONDENCES BETWEEN WORD LEARNING IN CHILDREN AND CAPTIONING MODELS

儿童单词学习与字幕模型之间的对应关系

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Sunayana Rane;Mira L. Nencheva;Zeyu Wang;C. Lew‐Williams;Olga Russakovsky;Thomas L. Grifﬁths
通讯作者：
Thomas L. Grifﬁths

Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

超越网络抓取：众包地理多样化的图像数据集

DOI：
10.48550/arxiv.2301.02560
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
V. V. Ramaswamy;S. Lin;Dora Zhao;Aaron B. Adcock;L. Maaten;Deepti Ghadiyaram;Olga Russakovsky
通讯作者：
Olga Russakovsky