开放场景中大规模物体识别方法研究-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

课题基金

基金详情

开放场景中大规模物体识别方法研究

结题报告

批准号：

61772500

项目类别：

面上项目

资助金额：

62.0 万元

负责人：

王瑞平

依托单位：

中国科学院计算技术研究所

学科分类：

F0210.计算机图像视频处理与多媒体技术

结题年份：

2021

批准年份：

2017

项目状态：

已结题

项目参与者：

刘昊淼、姜华杰、乔师师、郜迪飞、何晨、王文彬、刘永、毛懿荣、王睿岿

关键词：

场景理解弱监督学习属性分析物体识别迁移学习

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

通用物体识别是视觉场景理解的核心任务之一，近年来得益于深度学习技术的发展和互联网大数据的繁荣，取得了显著的进步，当前主流方法在一些封闭场景的数据集上甚至超越了人类视觉系统的识别能力。.本项目针对开放场景下大规模物体识别问题，以弱监督学习和迁移学习理论为基础，探索海量物体类别之间的复杂关联表示机理，研究大规模识别的高效可扩展分类理论，建立具有跨场景推广能力的视觉知识挖掘与迁移方法框架。.具体研究内容包括：针对复杂类间关联，研究符合人类感知机理的层级类别表示模型，提出语义属性嵌入的渐进式识别方法；分析数据长尾分布的统计特性，研究属性基元共享的跨类别零样本学习方法，建立真实场景下类别可扩展的开放式识别框架；研究场景与物体交互的视觉知识挖掘方法，构建具有自主更新与动态演化机制的视觉目标概念库，实现跨场景知识迁移引导下的大规模物体识别。.本项目预期取得理论创新与技术突破，促进视觉物体识别的实用推广。

英文摘要

Generic object recognition is one of the core tasks of visual scene understanding. In recent years, thanks to the development of deep learning technology and the prosperity of big Internet data, significant progress has been made. The current mainstream methods have even surpassed the performance of human beings visual system under some closed world settings..Aiming to tackle the problem of large scale object recognition in the open world, this project will take weakly supervised learning and transfer learning as its theoretical foundation, explore the complex correlation and representation mechanism between massive object categories, study the efficient and extensible classification theory for large scale recognition, and establish a framework of visual knowledge mining and transferring with the ability of cross scene generalization..Specifically, this project will address the following research issues. First, to characterize the complex cross-category correlation, we will study the hierarchical class representation model which conforms to the human perception mechanism, and propose a progressive recognition method with semantic attribute embedding. Second, we will conduct analysis on the statistical properties of the long tail distribution of data, study the cross-category zero-shot learning method based on shared attributes dictionary, and build an open world recognition framework with category scalability under realistic settings. Third, we will study the visual knowledge mining method that exploits the interaction between objects and scenes, construct a visual object concept database with self-update and dynamic evolution mechanism, and finally achieve appealing large scale object recognition under the guidance of cross-scene knowledge transfer..This project is expected to achieve both theoretical innovation and technological breakthroughs, and finally promote the practical application of visual object recognition.

本项目围绕开放场景下大规模物体识别这一关键核心问题开展深入研究，在类别层级关联机理建模、可扩展增量识别框架构建、场景与物体交互知识挖掘等方面取得了重要进展，主要工作如下：.（1）针对类别层级关联的内在形成机理，以属性作为关联不同物体类别的纽带，提出了属性知识引导的视觉特征学习与层级分类方法，显式解耦了视觉类别间的结构化分类规则，设计了高层类别与中层属性联合嵌入的多功能深度哈希学习框架，显著提升了二值编码特征的学习效率与精度；.（2）针对类别增量扩展的开放式识别框架，以知识作为数据的补充来引导模型学习，提出了一系列面向未知类增量识别的可扩展、可迁移、增量学习方法，建立了视觉数据空间与语义知识空间的映射关系，实现了视觉分类知识从已知域到未知域的迁移；.（3）针对场景与物体交互的视觉知识挖掘，以结构化场景图的构建作为基石，统筹刻画图像中的多维视觉概念元素（实体、属性、关系等），提出了上下文关系推理的场景物体检测、人类感知机理启发的层级视觉场景图生成等基础方法，有力支撑了“物体-->场景-->语言”递进式场景理解统一框架的构建。 .围绕上述工作，项目执行期间共发表/录用领域主流国际期刊和会议论文28篇（含CCF-A类论文12篇），其中国际期刊论文9篇（包括IEEE Trans. on PAMI论文1篇，IJCV论文2篇，IEEE Trans. on Image Processing论文1篇），会议论文19篇（包括IEEE CVPR论文4篇，ICCV论文4篇，ECCV论文2篇），并获得了IEEE CVPR2021 CLVision Workshop最佳论文奖1项。已发表论文Google Scholar引用606次，单篇最高172次。项目研究成果较为系统地建立了开放场景中大规模物体识别的系列理论与方法，引起了国际同行较为广泛的关注和跟进。项目成果申请并获授权国家发明专利1项、登记软件著作权1项。以上述工作为算法核心，分别获得了IEEE ICCV2019 WIDER视频行人检索竞赛亚军、CVPR2020 CLVision增量物体识别竞赛冠军。项目成果有效支持了课题组相关产业化项目的技术研发，促进了大规模物体识别与场景理解的广泛应用。

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

Adaptive Metric Learning For Zero-Shot Recognition

用于零样本识别的自适应度量学习

DOI：10.1109/lsp.2019.2917148

发表时间：2019-09-01

期刊：

IEEE SIGNAL PROCESSING LETTERS

影响因子：3.9

作者：

Jiang, Huajie;Wang, Ruiping;Chen, Xilin

通讯作者：Chen, Xilin

Learning Multifunctional Binary Codes for Personalized Image Retrieval

学习用于个性化图像检索的多功能二进制代码

DOI：10.1007/s11263-020-01315-0

发表时间：2020-03

期刊：

International Journal of Computer Vision

影响因子：19.5

作者：

Haomiao Liu;Ruiping Wang;Shiguang Shan;Xilin Chen

通讯作者：Xilin Chen

基于离散优化的哈希编码学习方法

DOI：--

发表时间：2019

期刊：

计算机学报

影响因子：--

作者：

刘昊淼;王瑞平;山世光;陈熙霖

通讯作者：陈熙霖

Attribute annotation on large-scale image database by active knowledge transfer

DOI：10.1016/j.imavis.2018.06.012

发表时间：2018-10

期刊：

Image Vis. Comput.

影响因子：--

作者：

Huajie Jiang;Ruiping Wang;Yan Li;Haomiao Liu;S. Shan;Xilin Chen

通讯作者：Huajie Jiang;Ruiping Wang;Yan Li;Haomiao Liu;S. Shan;Xilin Chen

What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria

什么是虎斑猫？

DOI：10.1109/tpami.2019.2954501

发表时间：2019-11

期刊：

IEEE Transactions on Pattern Analysis and Machine Intelligence

影响因子：23.6

作者：

Haomiao Liu;Ruiping Wang;Shiguang Shan;Xilin Chen

通讯作者：Xilin Chen

知识引导的自然场景跨模态图文生成方法研究

批准号：
U21B2025
项目类别：
联合基金项目
资助金额：
255万元
批准年份：
2021
负责人：
王瑞平
依托单位：
中国科学院计算技术研究所

视觉数据非线性建模与度量学习

批准号：
61922080
项目类别：
优秀青年科学基金项目
资助金额：
130万元
批准年份：
2019
负责人：
王瑞平
依托单位：
中国科学院计算技术研究所

面向视频目标识别的图像集合分类方法研究

批准号：
61379083
项目类别：
面上项目
资助金额：
76.0万元
批准年份：
2013
负责人：
王瑞平
依托单位：
中国科学院计算技术研究所

国内基金

海外基金

会员权益说明：