Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
基本信息
- 批准号:RGPIN-2018-04657
- 负责人:
- 金额:$ 2.99万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Visual recognition is a sub-field of computer vision which centers on building algorithms that can automatically and intelligently recognize, catalog and understand image/video content. Significant progress in accuracy and scope has been made on recognition problems in recent years, driven by powerful machine learning algorithms (e.g., deep learning), large labeled datasets and novel problem definitions. However, current approaches lack capabilities that permit accurate fine-grained detailed scene understanding. Most algorithms are limited to coarse image- or video-level interpretations (e.g., choosing among 1,000 noun categories, 200 action classes, or 18 candidate answers for a visual question). Such understanding is useful, but at the same time is still very limiting in enabling the breadth of potential applications ranging from media curation to autonomous navigation.
It is difficult to quantify what level of performance or scale is necessary for visual recognition to be broadly successful. I believe the next transformative milestone to be - detailed scene understanding at the accuracy of the current coarse models. This requires developing capabilities of recognizing fine-grained object categories, numbering in 100,000, and spatio-temporal (predicate) relations among the objects and elements of the scene, counting in 1,000. The former would give ability to recognize nearly every object/noun; the latter would enable situated contextual reasoning critically important for AI. My long term research objective is to develop such accurate detailed models for visual understanding at scale; models that can describe and localize objects and people, reason about their spatial and functional relationships, their actions and interactions.
This proposal tackles three fundamental sub-challenges to achieving this objective in the corresponding research threads:
1. The ever growing fine-grained set of classes requires development of novel data efficient learning algorithms. As the categories to recognize become more specific, the amount of data per category decreases (e.g., there are millions of car images, but few of 1957 Jaguar XKSS). We will build on our recent work where we developed the only method to date capable of recognizing up to 310,000 categories.
2. Moving beyond recognition of isolated objects, requires reasoning about structures relating objects, people, and scene elements in space and time. Rich flexible structured models will be developed to enable such reasoning.
3. To alleviate the black-box nature of existing architectures, not suitable for decision-critical tasks, we will develop algorithms that enable interpretability and more human-like introspective reasoning.
Importantly, the program will also focus on applying the developed algorithms to specific recognition problems relevant for media search/retrieval, augmented reality and medical imaging.
视觉识别是计算机视觉的一个子领域,其核心是构建能够自动智能地识别、分类和理解图像/视频内容的算法。近年来,在强大的机器学习算法(例如,深度学习)、大型标记数据集和新颖的问题定义。然而,目前的方法缺乏能力,允许准确的细粒度的详细场景的理解。大多数算法限于粗略的图像或视频级解释(例如,在1,000个名词类别、200个动作类或18个视觉问题的候选答案中进行选择)。这种理解是有用的,但同时在实现从媒体策划到自主导航的潜在应用的广度方面仍然非常有限。
很难量化什么水平的性能或规模是必要的视觉识别是广泛的成功。我相信下一个变革性的里程碑是-在当前粗糙模型的准确性上详细的场景理解。这需要开发识别细粒度对象类别(以100,000为单位)和场景对象和元素之间的时空(谓词)关系(以1,000为单位)的能力。前者将提供识别几乎所有对象/名词的能力;后者将使情境推理对AI至关重要。我的长期研究目标是开发这种精确的详细模型,用于大规模的视觉理解;模型可以描述和定位物体和人,推理它们的空间和功能关系,它们的动作和相互作用。
该提案解决了在相应的研究思路中实现这一目标的三个基本子挑战:
1.不断增长的细粒度类集需要开发新的数据高效学习算法。随着要识别的类别变得更加具体,每个类别的数据量减少(例如,有数以百万计的汽车图像,但很少有1957年捷豹XKSS)。我们将在最近的工作基础上,开发出迄今为止唯一能够识别多达310,000个类别的方法。
2.超越孤立对象的识别,需要推理空间和时间中与对象,人和场景元素相关的结构。将开发丰富灵活的结构化模型,以实现这种推理。
3.为了减轻现有架构的黑盒性质,不适合决策关键任务,我们将开发算法,使可解释性和更人性化的内省推理。
重要的是,该计划还将专注于将开发的算法应用于与媒体搜索/检索,增强现实和医学成像相关的特定识别问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sigal, Leonid其他文献
Canonical locality preserving Latent Variable Model for discriminative pose inference
- DOI:
10.1016/j.imavis.2012.06.009 - 发表时间:
2013-03-01 - 期刊:
- 影响因子:4.7
- 作者:
Tian, Yan;Sigal, Leonid;Jia, Yonghua - 通讯作者:
Jia, Yonghua
Multi-Level Semantic Feature Augmentation for One-Shot Learning
用于一次性学习的多级语义特征增强
- DOI:
10.1109/tip.2019.2910052 - 发表时间:
2019-09-01 - 期刊:
- 影响因子:10.6
- 作者:
Chen, Zitian;Fu, Yanwei;Sigal, Leonid - 通讯作者:
Sigal, Leonid
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
视频情感识别、归因和总结中的异构知识转移
- DOI:
10.1109/taffc.2016.2622690 - 发表时间:
2018-04-01 - 期刊:
- 影响因子:11.2
- 作者:
Xu, Baohan;Fu, Yanwei;Sigal, Leonid - 通讯作者:
Sigal, Leonid
HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion
- DOI:
10.1007/s11263-009-0273-6 - 发表时间:
2010-03-01 - 期刊:
- 影响因子:19.5
- 作者:
Sigal, Leonid;Balan, Alexandru O.;Black, Michael J. - 通讯作者:
Black, Michael J.
Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation
- DOI:
10.1007/s11263-011-0493-4 - 发表时间:
2012-05-01 - 期刊:
- 影响因子:19.5
- 作者:
Sigal, Leonid;Isard, Michael;Black, Michael J. - 通讯作者:
Black, Michael J.
Sigal, Leonid的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sigal, Leonid', 18)}}的其他基金
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Canada Research Chair in Computer Vision and Machine Learning
加拿大计算机视觉和机器学习研究主席
- 批准号:
CRC-2017-00182 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Canada Research Chairs
Canada Research Chair In Computer Vision And Machine Learning
加拿大计算机视觉和机器学习研究主席
- 批准号:
CRC-2017-00182 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Canada Research Chairs
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Canada Research Chair in Computer Vision and Machine Learning
加拿大计算机视觉和机器学习研究主席
- 批准号:
CRC-2017-00182 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Canada Research Chairs
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
522579-2018 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Canada Research Chair in Computer Vision and Machine Learning
加拿大计算机视觉和机器学习研究主席
- 批准号:
CRC-2017-00182 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Canada Research Chairs
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
522579-2018 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
相似国自然基金
基于热量传递的传统固态发酵过程缩小(Scale-down)机理及调控
- 批准号:22108101
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于Multi-Scale模型的轴流血泵瞬变流及空化机理研究
- 批准号:31600794
- 批准年份:2016
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
针对Scale-Free网络的紧凑路由研究
- 批准号:60673168
- 批准年份:2006
- 资助金额:25.0 万元
- 项目类别:面上项目
相似海外基金
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Semantic and pragmatic studies on the variety of minimizers in terms of polarity and scale
极性和尺度方面各种极小化的语义和语用研究
- 批准号:
22K00554 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
522579-2018 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
RGPIN-2018-04657 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Web-Scale Semantic Image and Video Understanding
网络规模的语义图像和视频理解
- 批准号:
522579-2018 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
RI: Medium: Broad-Coverage Semantic Parsing: Linguistic Representation Learning from Crowd-Scale Data
RI:中:广泛覆盖的语义解析:从人群规模数据中学习语言表示
- 批准号:
1562364 - 财政年份:2016
- 资助金额:
$ 2.99万 - 项目类别:
Continuing Grant
NRI: Large-Scale Collaborative Semantic Mapping using 3D Structure from Motion
NRI:使用 Motion 的 3D 结构进行大规模协作语义映射
- 批准号:
1426998 - 财政年份:2014
- 资助金额:
$ 2.99万 - 项目类别:
Continuing Grant
Semantic change detection through large-scale learning
通过大规模学习进行语义变化检测
- 批准号:
LP130100156 - 财政年份:2013
- 资助金额:
$ 2.99万 - 项目类别:
Linkage Projects