RI: Medium: Improving grounding, generalization and contextual reasoning in vision and language models

RI:中:改善视觉和语言模型中的基础、泛化和上下文推理

基本信息

  • 批准号:
    2107048
  • 负责人:
  • 金额:
    $ 120万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-01 至 2025-08-31
  • 项目状态:
    未结题

项目摘要

Recent Artificial Intelligence (AI) advances have brought us closer to the possibility of important and exciting real-world applications: ranging from robot assistants for the elderly or differently-abled, to large-scale video analysis of footage from police body-worn cameras to examine police-civilian interactions. Such applications require AI models to understand both visual and natural language cues. However, the state of vision-and-language technology is still not quite ready for these scenarios. Current visual recognition models appear to recognize many different objects but lack an understanding of the interconnection and structure of the visual world. Current image captioning systems output reasonable but completely generic image descriptions. Modern visual question answering systems are not robust to simple changes like synonyms or word rearrangements. This research will lead to fundamental advances in visual recognition and natural language understanding, laying the groundwork for more effective human-machine collaboration. The goal of this research is to move towards a tighter, more accurate and contextual integration of visual recognition and natural language processing. This involves addressing three key challenges: (1) enabling accurate and scalable grounding by establishing robust bi-directional connections between visual input and natural language tokens; (2) improving generalization of vision-and-language models to novel concepts and tasks; and (3) enabling contextual reasoning to allow models to effectively adapt to human or task-specific needs. The unifying theme is that all three challenges require innovation in not only modeling but also in reliable and insightful benchmarking: current evaluation frameworks are insufficient to drive progress in this space. The roadmap is to redesign existing benchmarks and evaluation paradigms, use the newly formulated metrics to identify the shortcomings in existing systems, and rely on these insights to drive the deep learning modeling innovations. This research uses the team’s expertise in designing multi-modal models for vision and language as well as in constructing effective large-scale benchmarks. The findings will be disseminated through technical workshops, open access publications, and open-source code. They will also be integrated into undergraduate, graduate and K-12 curriculum through collaboration with foundations like AI4ALL.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
最近人工智能(AI)的进步使我们更接近重要和令人兴奋的现实世界应用的可能性:从老年人或不同能力的机器人助手,到对警用随身摄像机的镜头进行大规模视频分析,以检查警民互动。这些应用程序需要AI模型来理解视觉和自然语言提示。然而,视觉和语言技术的状态仍然没有完全准备好这些场景。目前的视觉识别模型似乎可以识别许多不同的对象,但缺乏对视觉世界的互连和结构的理解。当前的图像字幕系统输出合理但完全通用的图像描述。现代视觉问答系统对同义词或单词重排等简单变化并不鲁棒。这项研究将导致视觉识别和自然语言理解的根本性进步,为更有效的人机协作奠定基础。这项研究的目标是朝着视觉识别和自然语言处理的更紧密,更准确和上下文集成的方向发展。这涉及解决三个关键挑战:(1)通过在视觉输入和自然语言标记之间建立强大的双向连接来实现准确和可扩展的基础;(2)改进视觉和语言模型对新概念和任务的泛化;(3)启用上下文推理,使模型能够有效地适应人类或特定任务的需求。统一的主题是,所有三个挑战不仅需要在建模方面创新,而且需要在可靠和有见地的基准方面创新:目前的评估框架不足以推动这一领域的进展。路线图是重新设计现有的基准和评估范式,使用新制定的指标来识别现有系统中的缺点,并依靠这些见解来推动深度学习建模创新。这项研究利用了团队在设计视觉和语言的多模态模型以及构建有效的大规模基准方面的专业知识。研究结果将通过技术研讨会、开放获取出版物和开源代码传播。通过与AI 4ALL等基金会的合作,它们还将被整合到本科生、研究生和K-12课程中。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks
  • DOI:
    10.48550/arxiv.2206.02916
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhiwei Deng-;Olga Russakovsky
  • 通讯作者:
    Zhiwei Deng-;Olga Russakovsky
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
  • DOI:
    10.48550/arxiv.2207.01206
  • 发表时间:
    2022-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shunyu Yao;Howard Chen;John Yang;Karthik Narasimhan
  • 通讯作者:
    Shunyu Yao;Howard Chen;John Yang;Karthik Narasimhan
Multi-query Video Retrieval
  • DOI:
    10.1007/978-3-031-19781-9_14
  • 发表时间:
    2022-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zeyu Wang;Yu Wu;Karthik Narasimhan;Olga Russakovsky
  • 通讯作者:
    Zeyu Wang;Yu Wu;Karthik Narasimhan;Olga Russakovsky
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct:在语言模型中协同推理和行动
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Olga Russakovsky其他文献

Best of both worlds: human-machine collaboration for object annotation (preliminary version)
两全其美:人机协作进行对象标注(初步版本)
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Olga Russakovsky
  • 通讯作者:
    Olga Russakovsky
Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
走风景路线:提高视觉和语言导航的泛化能力
C ORRESPONDENCES BETWEEN WORD LEARNING IN CHILDREN AND CAPTIONING MODELS
儿童单词学习与字幕模型之间的对应关系
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sunayana Rane;Mira L. Nencheva;Zeyu Wang;C. Lew‐Williams;Olga Russakovsky;Thomas L. Griffiths
  • 通讯作者:
    Thomas L. Griffiths
Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset
超越网络抓取:众包地理多样化的图像数据集
  • DOI:
    10.48550/arxiv.2301.02560
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    V. V. Ramaswamy;S. Lin;Dora Zhao;Aaron B. Adcock;L. Maaten;Deepti Ghadiyaram;Olga Russakovsky
  • 通讯作者:
    Olga Russakovsky
UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs
UFO:一种在基于概念的 CNN 解释中控制可理解性和可信度目标的统一方法
  • DOI:
    10.48550/arxiv.2303.15632
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    V. V. Ramaswamy;Sunnie S. Y. Kim;Ruth C. Fong;Olga Russakovsky
  • 通讯作者:
    Olga Russakovsky

Olga Russakovsky的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Olga Russakovsky', 18)}}的其他基金

CAREER: Overcoming bias in computer vision: Building fairer systems and training diverse leaders
职业:克服计算机视觉中的偏见:建立更公平的系统并培训多元化的领导者
  • 批准号:
    2145198
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Continuing Grant

相似海外基金

CSR: Medium: Improving the Interface between Machine Learning and Software Systems
CSR:中:改进机器学习和软件系统之间的接口
  • 批准号:
    2313190
  • 财政年份:
    2023
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Improving Software Quality by Automatically Reproducing Failures from Bug Reports
协作研究:SHF:中:通过自动重现错误报告中的故障来提高软件质量
  • 批准号:
    2403747
  • 财政年份:
    2023
  • 资助金额:
    $ 120万
  • 项目类别:
    Continuing Grant
Collaborative Research: SHF: Medium: Improving Software Quality by Automatically Reproducing Failures from Bug Reports
协作研究:SHF:中:通过自动重现错误报告中的故障来提高软件质量
  • 批准号:
    2211453
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Continuing Grant
SHF: Medium: Automated Software Engineering Techniques for Improving the Accessibility of Software
SHF:中:用于提高软件可访问性的自动化软件工程技术
  • 批准号:
    2211790
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Continuing Grant
HCC: Medium: Improving collaboration in remote teams through tools to promote mutual understanding of nonverbal behavior
HCC:中:通过促进非语言行为相互理解的工具改善远程团队的协作
  • 批准号:
    2212396
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
HCC: Medium: Improving data visualization and analysis tools to support reasoning about analysis assumptions
HCC:中:改进数据可视化和分析工具以支持分析假设的推理
  • 批准号:
    2211939
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Improving Software Quality by Automatically Reproducing Failures from Bug Reports
协作研究:SHF:中:通过自动重现错误报告中的故障来提高软件质量
  • 批准号:
    2211454
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Continuing Grant
SHF: Medium: Improving the Efficiency and Applicability of Decision Diagrams
SHF:中:提高决策图的效率和适用性
  • 批准号:
    2212142
  • 财政年份:
    2022
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
HCC: Medium: Improving Human-AI Collaboration on Decision-Making Tasks
HCC:中:改善人类与人工智能在决策任务上的协作
  • 批准号:
    2107391
  • 财政年份:
    2021
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
CHS: Medium: Understanding and Improving the Social Impact of High-Bandwidth Farm Networking Infrastructure
CHS:中:了解和改善高带宽农场网络基础设施的社会影响
  • 批准号:
    1955125
  • 财政年份:
    2020
  • 资助金额:
    $ 120万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了