基于递归注意力神经网络的图文摘要方法研究-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

基于递归注意力神经网络的图文摘要方法研究

结题报告

批准号：

61806101

项目类别：

青年科学基金项目

资助金额：

20.0 万元

负责人：

陈景强

依托单位：

南京邮电大学

学科分类：

F0606.自然语言处理

结题年份：

2021

批准年份：

2018

项目状态：

已结题

项目参与者：

胥备、龙显忠、陈家德、曹剑茹、郝伟

关键词：

图文摘要跨模态摘要递归神经网络注意力模型单文档摘要

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

随着互联网中带有图片的多模态文档数据（如新闻、博客）的爆炸式增长，包含文本和图片的图文摘要正成为人们更好更快获取信息的迫切需要。针对当前文本摘要方法中图片信息缺失的现状，利用递归注意力神经网络有效解决序列生成和分类问题的优点，本项目建立新的图文摘要方法，研究结合深度学习的图文摘要关键技术，主要有三个创新点：提出基于双层注意力模型的图文对齐模型，将句子与图片对齐，词与像素对齐，实现图文对齐关系发掘，用于图文摘要产生和排版；提出基于双层注意力编码-解码模型的生成式图文摘要方法，同时考虑对图片的注意力和对文本的注意力，先解码产生句表示，再解码产生词表示，选择图片并排版，实现图文摘要生成；提出基于递归神经网络的抽句式图文摘要方法，以对图片信息的覆盖度和对文本信息的覆盖度衡量句子重要度，选取重要句子和图片并排版，实现图文摘要抽取。本项目提供新的图文摘要范式和图文对齐方法，具有重要的科研价值和应用价值

英文摘要

With rapid growth of multi-modal document data with pictures such as news and blogs, text-picture summaries containing text and pictures have now become an emergency need for people better and faster getting information. To deal with the lack of picture information in traditional text summarization methods, this project builds a new text-picture summarization method based on attentional recurrent neural network, and studies the key image-text summarization techniques by combining deep learning and picture information. The main innovation points are as follows. 1) We propose text-picture alignment methods based on the hierarchical attentional model, which aligning sentences with pictures and aligning words with pixels. The model mines the hidden text-picture aligning relationships which are used for summary creation and summary alignment. 2) We propose generative the text-picture summarization method based on the hierarchical attentional Encoder-Decoder model by combining both the attention to pictures and the attention to text. The model generates text-picture summaries by firstly generating the sentence representations, secondly generating word representations, thirdly selecting pictures, and finally rearranging the generated texts and pictures. 3) We propose the extractive text-picture summarization method based on the recurrent neural network. The model computes the importance of the sentence using its coverage of the picture information and its coverage of text information, and selects and rearranges important sentences and pictures to form extractive summaries. This project provides a new paradigm of multi-modal summarization and a text-picture alignment method, and thus has significant scientific values and practical values.

随着互联网中多模态文档数据的增长，导致“信息过载”的问题。多模态自动摘要能够利用多模态数据产生图文并茂的摘要，是解决信息过载问题的重要方法。同时，深度学习技术善于利用和融合文本和图片信息，也为多模态摘要提供了现实技术支撑。因此，本项目研究了基于递归神经网络的多模态摘要方法，主要研究内容包括：基于层次注意力模型的图文对齐方法（研究内容1）、基于层次递归注意力神经网络的生成式图文摘要方法（研究内容2）、基于递归注意力神经网络的抽句式图文摘要方法（研究内容3）。发表一作论文5篇，通讯作者论文1篇，说明如下：.（1）提出了基于层次多模态注意力递归神经网络的生成式图文摘要模型，建立了数据集E-Dailymail。对句子、图片、标题分别用RNN和CNN编码，通过多模态注意机制对摘要的句子和图像进行对齐，解码时同时注意句子、图像和标题生成文本摘要，选取相关图片加入到摘要并对齐。实验表明，该模型优于不考虑图像的摘要方法，有较好的图文对齐效果，说明了图像对于提升摘要效果的意义。该成果对应研究内容1和2，发表于EMNLP2018。.（2）提出了基于多模态递归神经网络的抽取式图文摘要方法。将抽取式多模态摘要作为一个分类问题，先对句子、图片、标题进行编码，再通过一个逻辑斯蒂分类器计算句子选择概率和图文对齐概率，特征包括文本覆盖率、文本冗余率、图像集覆盖率和图像冗余度，提出了两种图像冗余计算方法。在E-Dailymail数据集实验表明，该方法优于纯文本摘要方法，说明将图像加入到抽取式摘要可以提高文本摘要质量，挖掘隐藏句子-图像对齐，创建较好抽取式多模态摘要。该成果对应研究内容1和2，发表于二区期刊FGCS。.（3）提出了基于多模态指针-生成器网络的新闻图片标题生成方法。使用文本注意和视觉注意来计算指针分布，用于生成图片标题。在DailyMail和BBC数据集实验表明，该模型优于原始指针生成器网络和多个基准方法。实验表明文本注意力和视觉注意力有利于提升自动标题效果。该成果对应研究内容1和2，发表于CCF C类期刊CCPE。

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

The Influence of Semantic Link Network on the Ability of Question-Answering System

语义链接网络对问答系统能力的影响

DOI：--

发表时间：2020

期刊：

future generation computer systems

影响因子：--

作者：

Bei Xu;Hai Zhuge

通讯作者：Hai Zhuge

基于知识图谱与关键词注意机制的中文医疗问答匹配方法

DOI：10.16451/j.cnki.issn1003-6059.202108006

发表时间：2021

期刊：

模式识别与人工智能

影响因子：--

作者：

乔凯;陈可佳;陈景强

通讯作者：陈景强

A news image captioning approach based on multimodal pointer-generator network

一种基于多模态指针生成网络的新闻图像字幕方法

DOI：10.1002/cpe.5721

发表时间：2020

期刊：

Concurrency and Computation-Practice & Experience

影响因子：2

作者：

Chen Jingqiang;Zhuge Hai

通讯作者：Zhuge Hai

The influence of semantic link network on the ability of question-answering system

DOI：10.1016/j.future.2020.02.042

发表时间：2020

期刊：

Future Generation Computer Systems-The International Journal of eScience

影响因子：7.5

作者：

Bei Xu;Hai Zhuge

通讯作者：Hai Zhuge

Extractive summarization of documents with images based on multi-modal RNN

基于多模态RNN的图像文档抽取摘要

DOI：10.1016/j.future.2019.04.045

发表时间：2019-10-01

期刊：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

影响因子：7.5

作者：

Chen, Jingqiang;Hai Zhuge

通讯作者：Hai Zhuge

国内基金

海外基金

会员权益说明：