权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Conditional Coding for Learned Image and Video Compression

用于学习图像和视频压缩的条件编码

基本信息

批准号：
508272532
负责人：
Professor Dr.-Ing. Jörn Ostermann
金额：
--
依托单位：
Institut für Informationsverarbeitung
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/508272532?language=en
关键词：
Conditional Coding Learned Image Video

项目摘要

This joint research project between the Institut für Informationsverarbeitung (TNT) of the Leibniz Universität Hannover (LUH) and the Department of Computer Science of the National Chiao Tung University (NYCU) in Taiwan addresses end-to-end learned video compression from the perspective of conditional coding with an meta learning-based regularization and tailoring scheme.The arrival of deep learning spurs a new wave of developments in end-to-end learned compression. Recent years witnessed the success of learned image compression, with the state-of-the-art showing better MS-SSIM results than (and comparable PSNR results to) VVC Intra. By comparison, the development of end-to-end learned video compression is still in its early stage. Most learned video codecs follow the traditional, hybrid-based coding architecture, namely temporal prediction followed by transform-based residual coding. A recent publication indicates that although the state-of-the-art learned video codecs show better results than x265, they can hardly compete with the HEVC Test Model (HM) under more realistic test conditions.Recently, a new school of thought, known as inter-frame conditional coding, emerged, taking end-to-end learned video coding to a new level of compression performance. The idea of conditional coding is to learn the data distribution of a coding frame conditioned on useful contextual information, in order to reach a lower conditional entropy rate for better compression.The emergence of deep generative models, such as variational autoencoders (VAE) and normalizing flow models, opens up new opportunities for a paradigm shift in learning-based compression. Currently, VAE is a popular choice for the compression backbone. Representing a new attempt, this joint research proposal introduces a special type of normalizing flow model, called augmented normalizing flows (ANF), for conditional coding. We choose ANF because it is shown to achieve superior expressiveness to VAE and includes VAE as a special case.Another notable aspect of this joint research project is to address the generalizability and adaptability of the learned video codecs. The learned codecs often suffer from the domain gap between the training and the test data; that is, they may not generalize well on unseen data. In a more general sense, they can hardly achieve optimal compression for individual test images/videos, each of which can in fact be considered a distinct domain. To improve the generalizability, this proposal shall incorporate Noether’s theorem in the form of meta learning to learn an inductive bias that encourages decoded video frames to conserve certain latent consistency in the temporal dimension. We shall also use this learned inductive bias to adapt the encoder and/or the decoder at inference time to suit individual videos. Due to its unsupervised nature, our approach has the striking feature of not having to signal any additional information in the bitstream.

德国汉诺威莱布尼茨大学（汉诺威）信息系统研究所（TNT）和台湾国立交通大学（NYCU）计算机科学系的联合研究项目从条件编码的角度出发，采用基于Meta学习的正则化和裁剪方案，解决了端到端学习视频压缩问题。深度学习的到来推动了端到端学习视频压缩的新一轮发展。结束学习压缩。近年来，学习图像压缩取得了成功，最先进的MS-SSIM结果优于VVC Intra（PSNR结果与VVC Intra相当）。相比之下，端到端学习视频压缩的发展仍处于早期阶段。大多数学习的视频编解码器遵循传统的基于混合的编码架构，即时间预测，然后是基于变换的残差编码。最近的一份出版物表明，尽管最先进的学习视频编解码器显示出比x265更好的结果，但在更真实的测试条件下，它们很难与HEVC测试模型（HM）竞争。最近，出现了一种新的思想流派，称为帧间条件编码，将端到端学习视频编码提升到一个新的压缩性能水平。条件编码的思想是学习编码帧的数据分布，以有用的上下文信息为条件，以达到更低的条件熵率以获得更好的压缩。深度生成模型的出现，如变分自编码器（VAE）和归一化流模型，为基于学习的压缩的范式转变提供了新的机会。目前，VAE是压缩主干网的流行选择。作为一种新的尝试，该联合研究提案引入了一种特殊类型的规范化流模型，称为增强规范化流（ANF），用于条件编码。我们选择ANF，因为它被证明可以实现上级的表现力VAE，包括VAE作为一个特殊的case.Another值得注意的方面，这个联合研究项目是解决学习视频编解码器的通用性和适应性。学习后的编解码器通常会受到训练数据和测试数据之间的域差距的影响;也就是说，它们可能无法很好地概括看不见的数据。从更一般的意义上说，它们很难实现对单个测试图像/视频的最佳压缩，每个测试图像/视频实际上都可以被认为是一个不同的域。为了提高可推广性，该提议将以Meta学习的形式并入Noether定理，以学习鼓励解码的视频帧在时间维度上保留某些潜在一致性的归纳偏差。我们还将使用这种学习的归纳偏差来在推理时调整编码器和/或解码器以适应各个视频。由于其无监督性质，我们的方法具有无需在比特流中发送任何附加信息的显著特征。