CAREER: Natural Narratives and Multimodal Context as Weak Supervision for Learning Object Categories

职业:自然叙事和多模态上下文作为学习对象类别的弱监督

基本信息

  • 批准号:
    2046853
  • 负责人:
  • 金额:
    $ 54.71万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-05-01 至 2026-04-30
  • 项目状态:
    未结题

项目摘要

This project develops a framework to train computer vision models for detection of objects from weak, naturally-occurring supervision of language (text or speech) and additional multimodal signals. It considers dynamic settings, where humans interact with their visual environment and refer to the encountered objects, e.g., “Carefully put the tomato plants in the ground” and “Please put the phone down and come set the table,” and captions written for a human audience to complement an image, e.g., news article captions. The challenge of using such language-based supervision for training detection systems is that along with useful signal, the speech contains many irrelevant tokens. The project will benefit society by exploring novel avenues for overcoming this challenge and reducing the need for expensive and potentially unnatural crowdsourced labels for training. It has the potential to make object detection systems more scalable and thus more usable by a broad user base in a variety of settings. The resources and tools developed would allow natural, lightweight learning in different environments, e.g., different languages or types of imagery where the well-known object categories are not useful or where there is a shift in both the pixels as well as the way in which humans refer to objects (different cultures, medicine, art). This project opens possibilities for learning in vivo rather than in vitro; while the focus here is on object categories, multimodal weak supervision is useful for a larger variety of tasks. Research and education are integrated through local community outreach and research mentoring for students from lesser-known universities, new programs for student training including honing graduate students' writing skills, and development of interactive educational modules and demos based on research findings. This project creatively connects two domains, vision-and-language, and object detection, and pioneers training of object detection models with weak language supervision and a large vocabulary of potential classes. The impact of noise in the language channel will be mitigated through three complementary techniques that model visual concreteness of words, to what extent the text refers to the visual environment it appears with, and whether the weakly-supervised models that are learned are logically consistent. Two complementary word-region association mechanisms will be used (metric learning and cross-modal transformers), whose application is novel for weakly-supervised detection. Importantly, to make detection feasible, not only the semantics of image-text pairs, but their discourse relationship, will be captured. To facilitate and disambiguate the association of words to a physical environment, the latter will be represented through additional modalities, namely sound, motion, depth and touch, which are either present in the data or estimated. This project advances knowledge of how multimodal cues contextualize the relation between image and text; no prior work has modeled image-text relationships along multiple channels (sound, depth, touch, motion). Finally, to connect the appearance of objects to the purpose and use of these objects, relationships between objects, properties and actions will be semantically organized in a graph, and grammars to represent activities involving objects will be extracted, still maintaining the weakly-supervised setting.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目开发了一个框架,用于训练计算机视觉模型,以从弱的、自然发生的语言(文本或语音)监督和其他多模态信号中检测对象。它考虑了动态设置,其中人类与他们的视觉环境进行交互并参考遇到的对象,例如,“小心地把番茄植物放在地上”和“请放下电话,过来摆桌子”,以及为人类观众编写的标题,以补充图像,例如,新闻标题将这种基于语言的监督用于训练检测系统的挑战在于,语音沿着有用的信号,包含许多不相关的标记。 该项目将通过探索克服这一挑战的新途径,减少对昂贵且可能不自然的众包标签的需求,从而造福社会。它有可能使目标检测系统更具可扩展性,从而在各种设置中更容易被广泛的用户群使用。开发的资源和工具将允许在不同的环境中进行自然的轻量级学习,例如,不同的语言或图像类型,其中众所周知的对象类别没有用,或者像素以及人类提及对象的方式都有变化(不同的文化,医学,艺术)。该项目为在体内而不是体外学习提供了可能性;虽然这里的重点是对象类别,但多模态弱监督对于更广泛的任务是有用的。研究和教育是通过当地社区的推广和研究辅导的学生从鲜为人知的大学,新的学生培训计划,包括磨练研究生的写作技巧,并根据研究结果的互动教育模块和演示的发展相结合。该项目创造性地将视觉和语言以及对象检测这两个领域联系起来,并开创了具有弱语言监督和大量潜在类词汇的对象检测模型的训练。语言通道中噪声的影响将通过三种互补技术来减轻,这些技术对单词的视觉具体性进行建模,文本在多大程度上涉及它出现的视觉环境,以及学习的弱监督模型是否在逻辑上一致。将使用两个互补的词区域关联机制(度量学习和交叉模态变换器),其应用对于弱监督检测是新颖的。重要的是,为了使检测可行,不仅要捕获图像-文本对的语义,还要捕获它们的话语关系。为了促进和消除单词与物理环境的关联,后者将通过额外的模态来表示,即声音,运动,深度和触摸,这些模态存在于数据中或估计。这个项目的知识如何多模态线索语境化图像和文本之间的关系,没有以前的工作已经建模图像-文本关系沿着多个通道(声音,深度,触摸,运动)。最后,为了将对象的外观与这些对象的目的和用途联系起来,对象、属性和动作之间的关系将在图中进行语义组织,并且将提取用于表示涉及对象的活动的语法,仍然保持着微弱的-该奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的评估支持影响审查标准。

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Weakly-Supervised Action Detection Guided by Audio Narration
Improving language-supervised object detection with linguistic structure analysis
Boosting Weakly Supervised Object Detection using Fusion and Priors from Hallucinated Depth
Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Adriana Kovashka其他文献

Detecting Sexually Provocative Images
检测性挑逗图像
Syntharch: Interactive Image Search with Attribute-Conditioned Synthesis
Syntharch:具有属性条件合成的交互式图像搜索
Inferring Visual Persuasion via Body Language, Setting, and Deep Features
通过肢体语言、场景和深层特征推断视觉说服力
Dorian: Music Recommendation Strategies using Social Network Mining
Dorian:使用社交网络挖掘的音乐推荐策略
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adriana Kovashka
  • 通讯作者:
    Adriana Kovashka
Interactive image search with attributes
  • DOI:
  • 发表时间:
    2014-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adriana Kovashka
  • 通讯作者:
    Adriana Kovashka

Adriana Kovashka的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Adriana Kovashka', 18)}}的其他基金

RI: Small: Multilingual Supervision for Object Detection under Geographic Domain and Concept Shifts
RI:小型:地理领域和概念转变下目标检测的多语言监督
  • 批准号:
    2329992
  • 财政年份:
    2023
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Travel: Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
旅行:为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    2222346
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
RI: Small: Domain-robust object detection through shape and context
RI:小:通过形状和上下文进行领域稳健的对象检测
  • 批准号:
    2006885
  • 财政年份:
    2020
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1742714
  • 财政年份:
    2017
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
RI: Small: Modeling Vividness and Symbolism for Decoding Visual Rhetoric
RI:小:建模生动性和象征意义以解码视觉修辞
  • 批准号:
    1718262
  • 财政年份:
    2017
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
CRII: RI: Automatically Understanding the Messages and Goals of Visual Media
CRII:RI:自动理解视觉媒体的信息和目标
  • 批准号:
    1566270
  • 财政年份:
    2016
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1630019
  • 财政年份:
    2016
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1529929
  • 财政年份:
    2015
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant

相似国自然基金

Natural超对称中的希格斯物理与暗物质研究
  • 批准号:
    11775039
  • 批准年份:
    2017
  • 资助金额:
    52.0 万元
  • 项目类别:
    面上项目
Natural超对称在LHC上的现象学研究
  • 批准号:
    11405015
  • 批准年份:
    2014
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Challenging Cultural Norms through Asset-focused Narratives: Examining Intersecting Stigmatized Identities from Graduate Student and Faculty Perspectives in the Natural Sciences
通过以资产为中心的叙事挑战文化规范:从自然科学研究生和教师的角度审视交叉的污名化身份
  • 批准号:
    2321219
  • 财政年份:
    2023
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Extraction of Symptom Burden from Clinical Narratives of Cancer Patients using Natural Language Processing
使用自然语言处理从癌症患者的临床叙述中提取症状负担
  • 批准号:
    10591957
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
Extraction of Symptom Burden from Clinical Narratives of Cancer Patients using Natural Language Processing
使用自然语言处理从癌症患者的临床叙述中提取症状负担
  • 批准号:
    10179677
  • 财政年份:
    2021
  • 资助金额:
    $ 54.71万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    10670801
  • 财政年份:
    2019
  • 资助金额:
    $ 54.71万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    9759499
  • 财政年份:
    2019
  • 资助金额:
    $ 54.71万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    10393499
  • 财政年份:
    2019
  • 资助金额:
    $ 54.71万
  • 项目类别:
Using Natural Language Processing of Patient Short Narratives to Detect Cognitive Impairment
使用患者简短叙述的自然语言处理来检测认知障碍
  • 批准号:
    16K12489
  • 财政年份:
    2016
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8722031
  • 财政年份:
    2012
  • 资助金额:
    $ 54.71万
  • 项目类别:
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8913773
  • 财政年份:
    2012
  • 资助金额:
    $ 54.71万
  • 项目类别:
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8400218
  • 财政年份:
    2012
  • 资助金额:
    $ 54.71万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了