权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Vision and language cross-modal for training conditional GANs with long-tail data.

使用长尾数据训练条件 GAN 的视觉和语言跨模式。

基本信息

批准号：
22K17947
负责人：
ヴォミンデュク
金额：
$ 1.66万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Early-Career Scientists
财政年份：
2022
资助国家：
日本
起止时间：
2022-04-01 至 2024-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-22K17947/
关键词：
Vision and language Novel object captioning GANs External knowledge Story evaluation Dataset Conditional GANs Long-tail data

项目摘要

We learn the cross-modality between vision and language spaces. We obtained three achievements:1.We collected a set of objects' names and definitions from the open dictionary Wiktionary and used the pre-trained BERT model to embed the definitions. We incorporated this external knowledge into an image captioning model, outperforming other methods in novel object captioning task. It was published at CVPR 2022.2.We proposed a new training scheme for GANs by using flipped and non-flipped non-saturating losses. It was published in the IEEE Access journal (IF 3.476).3.We created a new dataset for story evaluation, consisting of 100k story ranking data and 46k aspect rating and reasoning collected through the Reddit website and crowd-sourcing annotation process. It was published at EMNLP 2022.

我们学习视觉和语言空间之间的交叉情态。我们取得了三方面的成果：1.从开放词典维基词典中收集了一组对象的名称和定义，并使用预先训练好的BERT模型来嵌入这些定义。我们将这种外部知识融入到图像字幕模型中，在新的对象字幕任务中表现出了比其他方法更好的性能。它发表在CVPR 20222上。我们提出了一种新的GANS训练方案，使用翻转和非翻转非饱和损失。3.我们创建了一个新的故事评价数据集，包括通过Reddit网站和众包注释过程收集的10万个故事排名数据和46k个方面评级和推理。它是在EMNLP 2022上发表的。