INT2-Medium: Understanding the meaning of images

INT2-Medium:理解图像的含义

基本信息

  • 批准号:
    0803603
  • 负责人:
  • 金额:
    $ 55万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-08-15 至 2012-07-31
  • 项目状态:
    已结题

项目摘要

The ability to recognize objects in images is a core problem in computer vision. The last decade has seen astonishing advances in our methods to build object detectors. However, images convey richer information about the objects depicted in them: objects may form a scene ("A view of mountains and meadows"); objects are in relations with one another ("The cat sits on the mat"); different instances may look different ("The tabby cat sits on the blue mat"); objects may acting on others ("The cat is chasing the mouse"). This task of identifying the entities depicted in images, their attributes and relations is image understanding. This poses a number of new research questions: What objects should one remark on? What attributes of and relations between the objects depicted the image are important? That is, what is the visually salient information conveyed in an image?Many images (e.g. a large fraction of those on the web) are accompanied by text which describes or gives additional information about the entities depicted in them. The entities referred to in this text are typically visually salient ones. This correspondence between the information conveyed in the text and the image can be used in the creation of image understanding systems. Much current work treats image annotations that consist of individual words. The richer representations of meaning required to train image understanding systems can be obtained if annotating text is treated as sentences (rather than just bags of words). Sentences provide cues to: what is salient in an image; what salient objects likely look like (e.g. color, texture and form); and what relations might appear between them. Exposing this information will provide a rich body of training data for the next generation of computer vision systems.Research in natural language processing has created statistical wide- coverage parsers that can recover the semantic interpretation of sentences. These parsers differ from purely syntactic parsers in that they are based on linguistically expressive grammars that allow such interpretations to be built directly from the syntactic analysis. However, linking sentences with accompanying images requires a level of representation that goes beyond lists of the entities, states and events mentioned in a sentence. The writer of an image caption will typically assume that the reader sees the image, and can therefore refer to the entities depicted in it as known to the reader. There is a need parsers that are able to uncover the information structure of sentences -- what information is assumed to be shared knowledge between speaker and hearer, and what is new information asserted by the sentence. How information structure is encoded in natural language is well understood, and can be modeled with the same kinds of grammars that are used by those parsers that return semantic interpretations. Although there are currently no large corpora annotated with information structure, we will exploit the correspondence between images and their captions to develop novel, partially supervised, training regimes for parsers. These training regimes could also enable the bootstrapping of parsers for languages with no or little annotated training data.This project will build a novel parser that recovers richer linguistic representations, including information structure. It will build a novel image understanding system that recovers the salient entities depicted in an image together with their attributes and relations. The project will train these systems both separately on datasets consisting of sentences marked up with correct parses and images marked up with labels attached to objects, and jointly on a dataset of captioned images.Intellectual merits: The project goals are ambitious, but within reach, because both object recognition and parsing technology has advanced significantly. The project presents the vision and parsing communities with new goals, which are practically important and technically demanding. The aim of integrating natural language processing and computer vision creates a novel impetus to develop parsers that return richer linguistic representations, which will in turn have a deep impact on research within the natural language processing community itself. It will open up key directions in computer vision and natural language processing by demanding and enabling the recovery of richer representations of linguistic and visual information, and by studying how linguistic descriptions are grounded in the visual world.Broader impact: The project has significant practical implications in a number of areas such as image search, natural language interfaces for robotics, and will ultimately pave the way for new applications such as automatic captioning systems. The resulting advances in object recognition offer possibilities for the creation of safer autonomous vehicles, safer homes for better home care, and efficient management of surveillance data.URL: http://luthuli.cs.uiuc.edu/~daf/meaningofimages.html
识别图像中对象的能力是计算机视觉中的核心问题。 在过去的十年中,我们建立对象探测器的方法的进步令人惊讶。 但是,图像传达了有关它们中描述的对象的更丰富信息:对象可能形成一个场景(“山和草地的视图”);物体彼此之间的关系(“猫坐在垫子上”);不同的实例看起来可能不同(“塔比猫位于蓝色垫子上”);物体可能会对他人作用(“猫正在追逐鼠标”)。这项识别图像中描绘的实体,它们的属性和关系的任务是图像理解。 这提出了许多新的研究问题:应该说什么对象?描绘图像的对象之间的哪些属性和关系很重要? 也就是说,图像中传达的视觉显着信息是什么?许多图像(例如,网络上的很大一部分)伴随着文本,这些文本描述或提供了有关它们中描述的实体的其他信息。 本文中提到的实体通常是视觉上显着的。文本中传达的信息与图像之间的对应关系可用于创建图像理解系统。当前的许多工作处理由单个单词组成的图像注释。如果将注释文本视为句子(而不是单词袋),则可以获得训练图像理解系统所需的含义的富裕表示。 句子为:图像中的显着性提供了线索;显着对象可能是什么样的(例如颜色,纹理和形式);他们之间可能会出现什么关系。公开此信息将为下一代计算机视觉系统提供丰富的培训数据。自然语言处理中的研究创建了统计广泛的覆盖范围解析器,可以恢复句子的语义解释。 这些解析器不同于纯粹的句法解析器,因为它们基于语言表达性语法,这些语法允许直接从句法分析中构建此类解释。 但是,将句子与随附的图像联系起来需要一定程度的表示,这超出了句子中提到的实体,状态和事件的列表。图像标题的作者通常会假定读者看到图像,因此可以参考读者所知道的实体。有必要的解析器能够揭示句子的信息结构 - 说话者和听众之间的哪些信息是共享的知识,以及句子所主张的新信息。如何以自然语言编码信息结构,可以很好地理解,并且可以用那些返回语义解释的解析器使用的类型的语法来建模。 尽管目前尚无信息结构注释的大型语料库,但我们将利用图像及其字幕之间的对应关系来开发针对解析器的新颖,部分监督的培训制度。这些培训制度还可以使解析器的引导程序无需带注释的培训数据。该项目将建立一个新颖的解析器,以恢复更丰富的语言表示,包括信息结构。它将建立一个新颖的图像理解系统,该系统恢复图像中描绘的显着实体以及其属性和关系。该项目将分别训练这些系统在包含标有正确的分析的句子的数据集上,并标记了带有附在物体的标签的标签,并在字幕图像的数据集上共同构建标签。IntlellectualFures.intlellectual Fures.Intellectual:Intarmiation:雄心勃勃,但触手可及,因为对象识别和解析技术均大大提高了。 该项目以新的目标为愿景和解析社区展示了这些目标,这些目标实际上很重要且技术要求。 整合自然语言处理和计算机视觉的目的创造了一种新颖的动力,以开发返回富裕语言表征的解析器,这反过来又将对自然语言处理社区本身的研究产生深远的影响。 它将通过要求和能够恢复语言和视觉信息的更丰富表示形式,并研究语言描述在视觉世界中如何基于语言描述来打开计算机视觉和自然语言处理的关键方向。BOADER的影响:该项目在许多领域具有重要的实际含义,例如图像搜索,自动语言互动的自动座点,以及最终的应用程序,以及新的应用程序。对象识别的结果进步为创建更安全的自动驾驶汽车,更安全的家庭护理的更安全住房以及监视data.url的有效管理提供了可能性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Forsyth其他文献

Supplement - Convex Decomposition of Indoor Scenes
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth
Hidden Markov Models
隐马尔可夫模型
  • DOI:
    10.1007/978-3-030-18114-7_13
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth
Preserving Image Properties Through Initializations in Diffusion Models
通过扩散模型中的初始化保留图像属性
Scientific report on Modeling and Prediction of Human Intent for Primitive Activation
关于人类原始激活意图的建模和预测的科学报告
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Forsyth
  • 通讯作者:
    David Forsyth
Fully spectrum-sliced four-wave mixing wavelength conversion in a Semiconductor Optical Amplifier
半导体光放大器中的全光谱切片四波混频波长转换

David Forsyth的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Forsyth', 18)}}的其他基金

RI: Medium: Creating Knowledge with All-Novel-Class Computer Vision
RI:媒介:利用新颖的计算机视觉创造知识
  • 批准号:
    2106825
  • 财政年份:
    2021
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Collaborative Research: Computational Behavioral Science: Modeling, Analysis, and Visualization of Social and Communicative Behavior
合作研究:计算行为科学:社交和交流行为的建模、分析和可视化
  • 批准号:
    1029035
  • 财政年份:
    2010
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
RI: Small: Exploiting Geometric and Illumination Context in Indoor Scenes
RI:小:利用室内场景中的几何和照明环境
  • 批准号:
    0916014
  • 财政年份:
    2009
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Interpreting Human Behaviour in Video using FSA's and Object Context
使用 FSA 和对象上下文解释视频中的人类行为
  • 批准号:
    0534837
  • 财政年份:
    2006
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Finding and Tracking People from the Bottom Up
自下而上查找和跟踪人员
  • 批准号:
    0098682
  • 财政年份:
    2001
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Purchase of a Molecular Modeling System
购买分子建模系统
  • 批准号:
    9974642
  • 财政年份:
    1999
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
SGER: MCMC Algorithms for Object Recognition
SGER:用于对象识别的 MCMC 算法
  • 批准号:
    9979201
  • 财政年份:
    1999
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
A Spiral Approach to Chemical Concepts Using GC/MS
使用 GC/MS 探索化学概念的螺旋方法
  • 批准号:
    9850580
  • 财政年份:
    1998
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Workshop on Shape, Contour and Grouping
形状、轮廓和分组研讨会
  • 批准号:
    9712426
  • 财政年份:
    1997
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Recognising curved surfaces from their outlines
从轮廓识别曲面
  • 批准号:
    9596025
  • 财政年份:
    1994
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant

相似国自然基金

复合低维拓扑材料中等离激元增强光学响应的研究
  • 批准号:
    12374288
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
基于管理市场和干预分工视角的消失中等企业:特征事实、内在机制和优化路径
  • 批准号:
    72374217
  • 批准年份:
    2023
  • 资助金额:
    41.00 万元
  • 项目类别:
    面上项目
托卡马克偏滤器中等离子体的多尺度算法与数值模拟研究
  • 批准号:
    12371432
  • 批准年份:
    2023
  • 资助金额:
    43.5 万元
  • 项目类别:
    面上项目
中等质量黑洞附近的暗物质分布及其IMRI系统引力波回波探测
  • 批准号:
    12365008
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
中等垂直风切变下非对称型热带气旋快速增强的物理机制研究
  • 批准号:
    42305004
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

「科学的不確実性」の理解を意図した中等理科カリキュラム開発の理論的・実証的研究
旨在理解“科学不确定性”的中学科学课程开发的理论与实证研究
  • 批准号:
    24KJ1727
  • 财政年份:
    2024
  • 资助金额:
    $ 55万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Collaborative Research: SaTC: CORE: Medium: Understanding the Impact of Privacy Interventions on the Online Publishing Ecosystem
协作研究:SaTC:核心:媒介:了解隐私干预对在线出版生态系统的影响
  • 批准号:
    2237329
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Understanding and Combatting Impersonation Attacks and Data Leakage in Online Advertising
协作研究:SaTC:核心:媒介:理解和打击在线广告中的冒充攻击和数据泄露
  • 批准号:
    2247516
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
Postdoctoral Fellowship: AAPF: All Shook Up: Understanding the Chemistry, Dynamics, and Kinematics of the Diffuse Interstellar Medium
博士后奖学金:AAPF:一切都震惊了:了解弥漫星际介质的化学、动力学和运动学
  • 批准号:
    2303902
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Fellowship Award
Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
协作研究:SaTC:TTP:中:iDRAMA.cloud:测量和理解信息操纵的平台
  • 批准号:
    2247867
  • 财政年份:
    2023
  • 资助金额:
    $ 55万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了