基于深度学习的图像文本描述自动生成方法研究-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

基于深度学习的图像文本描述自动生成方法研究

结题报告

批准号：

61806218

项目类别：

青年科学基金项目

资助金额：

19.0 万元

负责人：

郭延明

依托单位：

中国人民解放军国防科技大学

学科分类：

F0604.机器感知与机器视觉

结题年份：

2021

批准年份：

2018

项目状态：

已结题

项目参与者：

谢毓湘、白亮、郭金林、张芯、梁经韵、刘爽、老明锐

关键词：

图文匹配深度学习自然语言处理注意力模型图像描述

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

图像文本描述自动生成是结合计算机视觉和自然语言处理的交叉性任务，近年来受到了广泛的关注。本课题重点开展基于深度学习的图像描述生成研究，旨在探索准确、符合人们表达习惯的图像描述新方法。具体来说，本课题拟开展以下研究：(1) 在图像特征提取方面，研究全局-局部相融合的特征提取方式，既能把握图像的全局内容，又能根据单词的不同自适应的关注图像的局部区域；(2) 在文本描述生成方面，关注单词在词频和语义上的差异性，通过引入词频约束来保证词频低、语义高的单词不被词频高、语义低的单词所湮没，通过引入语法约束来保证生成的语句符合语法结构；(3) 在总体框架设计方面，在传统图像文本描述自动生成框架中引入图-文跨媒体关联模块，使得生成的图像特征更加符合文本生成的需要；同时，将“图像→文本”和“文本→图像”这两个对偶任务放到统一的框架中进行训练，相互促进，达到对图像文本描述自动生成任务进行监督反馈的目的。

英文摘要

As the joint task of computer vision and natural language processing, image captioning has received increasing attention in recent years. This research aims to explore new image captioning models that is accurate and in line with human expression habit, based on deep learning technique. Specifically, the research consists of the following three subjects: (1) in the aspect of image feature extraction, this project investigates the global-local fusion method, which could not only reflect the global image content, but can also automatically attend to a local image region according to the generated word; (2) for the textual sentence generation, this project considers the differences of the word frequency and the word semantic meaning. On the one hand, it introduces the word frequency constraint to ensure the words with low frequency but high semantic meaning not be annihilated by words with high frequency but low semantic meaning. On the other hand, it introduces the grammatical constraint to ensure the generated sentence in line with the grammatical structure; (3) for the overall framework, this project introduces an image-text matching module within the conventional captioning framework, in order to make the image feature more suitable to generate the textual sentence. In addition, this project seeks to put the dual tasks, i.e. image→text and text→image, within a unified framework for training and promoting each other, through which to give the feedback to the image captioning model during the training phase.

图像文本描述自动生成是结合计算机视觉和自然语言处理的交叉性任务，对于人工智能发展和人们生活的需要具有重要的意义。本项目围绕该任务开展研究，主要内容包括：1）提出了全局-局部相融合的图像特征提取方式，设计了局部特征提取器和两级分类网络串联的PFNet，能够结合图像的全局特征和局部区域，提升了图像特征的可区分性；2）提出了循环一致性嵌入的图-文匹配，考虑了模态间和模态内的关联与一致性，学习到了鲁棒的视觉与文本关联关系，同时提出了一种新颖的基于文本牵引的自注意力机制架构，通过将文本牵引向量融合到解码器部分，更好的架起图像与文本之间的语义鸿沟，从而生成了更为准确的文本描述；3）提出了基于词频约束和语法约束的语言描述生成方法，通过改进训练损失并融入词性约束，提升了文本描述的信息量与语法准确性；4）提出了一种可以端到端训练的对偶预测网络DPN，在传统预测的基础上增加了反向重建预测，通过对预测过程引入额外的约束，更好的挖掘了输入与输出目标之间的关系，从而提升了图像描述的效果。基于此项目，培养毕业研究生5名，发表论文12篇，包括领域内的顶级CCF-A类国际会议MM，SCI期刊Neurocomputing等，申请专利8项，为图像描述在其他领域的应用提供了理论支撑。

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

基于局部特征显著化的场景分类方法

DOI：--

发表时间：2020

期刊：

信号处理

影响因子：--

作者：

张家辉;谢毓湘;郭延明

通讯作者：郭延明

Multi-stage hybrid embedding fusion network for visual question answering

DOI：10.1016/j.neucom.2020.10.071

发表时间：2021

期刊：

Neurocomputing

影响因子：6

作者：

Mingrui Lao;Yanming Guo;Nan Pu;Wei Chen;Yu Liu;M. Lew

通讯作者：Mingrui Lao;Yanming Guo;Nan Pu;Wei Chen;Yu Liu;M. Lew

CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention

DOI：10.3390/app10020618

发表时间：2020-01

期刊：

Applied Sciences

影响因子：--

作者：

Xianghan Wang;Jie Jiang;Yanming Guo;Lai Kang;Yingmei Wei;Dan Li

通讯作者：Xianghan Wang;Jie Jiang;Yanming Guo;Lai Kang;Yingmei Wei;Dan Li

AAE-SC: A scRNA-Seq Clustering Framework Based on Adversarial Autoencoder

AAE-SC：基于对抗性自动编码器的 scRNA-Seq 聚类框架

DOI：10.1109/access.2020.3027481

发表时间：2020-01-01

期刊：

IEEE ACCESS

影响因子：3.9

作者：

Wu, Yulun;Guo, Yanming;Lao, Songyang

通讯作者：Lao, Songyang

Attentional Feature Refinement and Alignment Network for Aircraft Detection in SAR Imagery

DOI：10.1109/tgrs.2021.3139994

发表时间：2022-01

期刊：

IEEE Transactions on Geoscience and Remote Sensing

影响因子：8.2

作者：

Yan Zhao;Lingjun Zhao;Zhongkang Liu;Dewen Hu;Gangyao Kuang;Li Liu

通讯作者：Yan Zhao;Lingjun Zhao;Zhongkang Liu;Dewen Hu;Gangyao Kuang;Li Liu

面向对抗场景自适应的鲁棒高效对抗训练方法研究

批准号：
2025JJ40066
项目类别：
省市级项目
资助金额：
0.0万元
批准年份：
2025
负责人：
郭延明
依托单位：
中国人民解放军国防科技大学

基于深度学习的视觉自动问答方法研究

批准号：
2019JJ50722
项目类别：
省市级项目
资助金额：
0.0万元
批准年份：
2019
负责人：
郭延明
依托单位：
中国人民解放军国防科技大学

国内基金

海外基金

会员权益说明：