INT2-Medium: Understanding the meaning of images

INT2-Medium:理解图像的含义

基本信息

  • 批准号:
    0803603
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-08-15 至 2012-07-31
  • 项目状态:
    已结题

项目摘要

The ability to recognize objects in images is a core problem in computer vision. The last decade has seen astonishing advances in our methods to build object detectors. However, images convey richer information about the objects depicted in them: objects may form a scene ("A view of mountains and meadows"); objects are in relations with one another ("The cat sits on the mat"); different instances may look different ("The tabby cat sits on the blue mat"); objects may acting on others ("The cat is chasing the mouse"). This task of identifying the entities depicted in images, their attributes and relations is image understanding. This poses a number of new research questions: What objects should one remark on? What attributes of and relations between the objects depicted the image are important? That is, what is the visually salient information conveyed in an image?Many images (e.g. a large fraction of those on the web) are accompanied by text which describes or gives additional information about the entities depicted in them. The entities referred to in this text are typically visually salient ones. This correspondence between the information conveyed in the text and the image can be used in the creation of image understanding systems. Much current work treats image annotations that consist of individual words. The richer representations of meaning required to train image understanding systems can be obtained if annotating text is treated as sentences (rather than just bags of words). Sentences provide cues to: what is salient in an image; what salient objects likely look like (e.g. color, texture and form); and what relations might appear between them. Exposing this information will provide a rich body of training data for the next generation of computer vision systems.Research in natural language processing has created statistical wide- coverage parsers that can recover the semantic interpretation of sentences. These parsers differ from purely syntactic parsers in that they are based on linguistically expressive grammars that allow such interpretations to be built directly from the syntactic analysis. However, linking sentences with accompanying images requires a level of representation that goes beyond lists of the entities, states and events mentioned in a sentence. The writer of an image caption will typically assume that the reader sees the image, and can therefore refer to the entities depicted in it as known to the reader. There is a need parsers that are able to uncover the information structure of sentences -- what information is assumed to be shared knowledge between speaker and hearer, and what is new information asserted by the sentence. How information structure is encoded in natural language is well understood, and can be modeled with the same kinds of grammars that are used by those parsers that return semantic interpretations. Although there are currently no large corpora annotated with information structure, we will exploit the correspondence between images and their captions to develop novel, partially supervised, training regimes for parsers. These training regimes could also enable the bootstrapping of parsers for languages with no or little annotated training data.This project will build a novel parser that recovers richer linguistic representations, including information structure. It will build a novel image understanding system that recovers the salient entities depicted in an image together with their attributes and relations. The project will train these systems both separately on datasets consisting of sentences marked up with correct parses and images marked up with labels attached to objects, and jointly on a dataset of captioned images.Intellectual merits: The project goals are ambitious, but within reach, because both object recognition and parsing technology has advanced significantly. The project presents the vision and parsing communities with new goals, which are practically important and technically demanding. The aim of integrating natural language processing and computer vision creates a novel impetus to develop parsers that return richer linguistic representations, which will in turn have a deep impact on research within the natural language processing community itself. It will open up key directions in computer vision and natural language processing by demanding and enabling the recovery of richer representations of linguistic and visual information, and by studying how linguistic descriptions are grounded in the visual world.Broader impact: The project has significant practical implications in a number of areas such as image search, natural language interfaces for robotics, and will ultimately pave the way for new applications such as automatic captioning systems. The resulting advances in object recognition offer possibilities for the creation of safer autonomous vehicles, safer homes for better home care, and efficient management of surveillance data.URL: http://luthuli.cs.uiuc.edu/~daf/meaningofimages.html
识别图像中物体的能力是计算机视觉的核心问题。在过去的十年里,我们建造物体探测器的方法取得了惊人的进步。然而,图像传达了关于其中所描绘的对象的更丰富的信息:对象可能形成一个场景(《一览群书》);对象可能是相互关联的(《猫坐在席子上》);不同的实例可能看起来不同(《猫·猫坐在蓝色席子上》);对象可能作用于他人(《猫追老鼠》)。识别图像中描述的实体、它们的属性和关系的任务就是图像理解。这提出了一些新的研究问题:人们应该评论哪些对象?图像所描述的对象的哪些属性以及它们之间的关系是重要的?也就是说,图像中传达的视觉显著信息是什么?许多图像(例如,网络上的很大一部分图像)伴随着描述或提供关于其中所描述的实体的附加信息的文本。本文中提到的实体通常是视觉上突出的实体。文本中所传达的信息与图像之间的这种对应关系可用于创建图像理解系统。目前的许多工作都是处理由单个单词组成的图像注释。如果将注释文本视为句子(而不仅仅是一袋单词),则可以获得训练图像理解系统所需的更丰富的含义表示。句子提供了如下线索:图像中的突出部分;突出对象看起来可能是什么样子(例如颜色、纹理和形状);以及它们之间可能出现的关系。这些信息的公开将为下一代计算机视觉系统提供丰富的训练数据。自然语言处理的研究已经创建了统计上覆盖面广的句法分析器,可以恢复句子的语义解释。这些解析器与纯粹的句法解析器的不同之处在于,它们基于语言表达语法,这些语法允许直接从句法分析构建这样的解释。然而,将句子与伴随的图像联系起来,需要的表现水平超出了句子中提到的实体、状态和事件的列表。图像字幕的编写者通常会假设读者看到了该图像,因此可以引用读者已知的其中所描述的实体。需要能够揭示句子的信息结构的解析器--什么信息被认为是说话人和听话人之间共享的知识,什么是句子断言的新信息。信息结构如何在自然语言中编码是很容易理解的,并且可以用返回语义解释的解析器使用的相同类型的语法来建模。虽然目前还没有大型的信息结构标注语料库,但我们将利用图像及其字幕之间的对应关系来开发新的、部分监督的解析器培训机制。这些训练机制还可以使没有或很少有注释训练数据的语言的解析器能够自举。这个项目将构建一个新的解析器,可以恢复更丰富的语言表示,包括信息结构。它将建立一个新的图像理解系统,恢复图像中描述的显著实体及其属性和关系。该项目将分别在由标有正确句法的句子和贴在物体上的标签标记的图像组成的数据集上,以及在字幕图像的数据集上联合培训这些系统。智力优势:项目目标雄心勃勃,但触手可及,因为物体识别和解析技术都有了显著的进步。该项目为远景和解析社区提供了新的目标,这些目标实际上很重要,技术上也要求很高。将自然语言处理和计算机视觉相结合的目的为开发返回更丰富的语言表示的解析器创造了新的动力,这反过来将对自然语言处理社区内部的研究产生深远的影响。它将开辟计算机视觉和自然语言处理的关键方向,要求并实现语言和视觉信息的更丰富表示的恢复,并通过研究语言描述如何植根于视觉世界。广泛的影响:该项目在图像搜索、机器人的自然语言接口等领域具有重要的实际意义,并最终将为自动字幕系统等新应用铺平道路。由此产生的物体识别方面的进步为创造更安全的自动驾驶车辆、更安全的家庭护理和高效的监控数据管理提供了可能性。网址:http://luthuli.cs.uiuc.edu/~daf/meaningofimages.html

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Forsyth其他文献

Supplement - Convex Decomposition of Indoor Scenes
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth
Hidden Markov Models
隐马尔可夫模型
  • DOI:
    10.1007/978-3-030-18114-7_13
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth
Preserving Image Properties Through Initializations in Diffusion Models
通过扩散模型中的初始化保留图像属性
Fully spectrum-sliced four-wave mixing wavelength conversion in a Semiconductor Optical Amplifier
半导体光放大器中的全光谱切片四波混频波长转换
Scientific report on Modeling and Prediction of Human Intent for Primitive Activation
关于人类原始激活意图的建模和预测的科学报告
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth

David Forsyth的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Forsyth', 18)}}的其他基金

RI: Medium: Creating Knowledge with All-Novel-Class Computer Vision
RI:媒介:利用新颖的计算机视觉创造知识
  • 批准号:
    2106825
  • 财政年份:
    2021
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Collaborative Research: Computational Behavioral Science: Modeling, Analysis, and Visualization of Social and Communicative Behavior
合作研究:计算行为科学:社交和交流行为的建模、分析和可视化
  • 批准号:
    1029035
  • 财政年份:
    2010
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
RI: Small: Exploiting Geometric and Illumination Context in Indoor Scenes
RI:小:利用室内场景中的几何和照明环境
  • 批准号:
    0916014
  • 财政年份:
    2009
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Interpreting Human Behaviour in Video using FSA's and Object Context
使用 FSA 和对象上下文解释视频中的人类行为
  • 批准号:
    0534837
  • 财政年份:
    2006
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Finding and Tracking People from the Bottom Up
自下而上查找和跟踪人员
  • 批准号:
    0098682
  • 财政年份:
    2001
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Purchase of a Molecular Modeling System
购买分子建模系统
  • 批准号:
    9974642
  • 财政年份:
    1999
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
SGER: MCMC Algorithms for Object Recognition
SGER:用于对象识别的 MCMC 算法
  • 批准号:
    9979201
  • 财政年份:
    1999
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
A Spiral Approach to Chemical Concepts Using GC/MS
使用 GC/MS 探索化学概念的螺旋方法
  • 批准号:
    9850580
  • 财政年份:
    1998
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop on Shape, Contour and Grouping
形状、轮廓和分组研讨会
  • 批准号:
    9712426
  • 财政年份:
    1997
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Recognising curved surfaces from their outlines
从轮廓识别曲面
  • 批准号:
    9596025
  • 财政年份:
    1994
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant

相似海外基金

Collaborative Research: SaTC: CORE: Medium: Understanding the Impact of Privacy Interventions on the Online Publishing Ecosystem
协作研究:SaTC:核心:媒介:了解隐私干预对在线出版生态系统的影响
  • 批准号:
    2237329
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Understanding and Combatting Impersonation Attacks and Data Leakage in Online Advertising
协作研究:SaTC:核心:媒介:理解和打击在线广告中的冒充攻击和数据泄露
  • 批准号:
    2247516
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Postdoctoral Fellowship: AAPF: All Shook Up: Understanding the Chemistry, Dynamics, and Kinematics of the Diffuse Interstellar Medium
博士后奖学金:AAPF:一切都震惊了:了解弥漫星际介质的化学、动力学和运动学
  • 批准号:
    2303902
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Fellowship Award
Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
协作研究:SaTC:TTP:中:iDRAMA.cloud:测量和理解信息操纵的平台
  • 批准号:
    2247867
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312374
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312373
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Understanding the Impact of Privacy Interventions on the Online Publishing Ecosystem
协作研究:SaTC:核心:媒介:了解隐私干预对在线出版生态系统的影响
  • 批准号:
    2237328
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Understanding the Impact of Privacy Interventions on the Online Publishing Ecosystem
协作研究:SaTC:核心:媒介:了解隐私干预对在线出版生态系统的影响
  • 批准号:
    2237327
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
  • 批准号:
    2310113
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
协作研究:SaTC:TTP:中:iDRAMA.cloud:测量和理解信息操纵的平台
  • 批准号:
    2247868
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了