权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: A Compression-Based Approach to Learning Video Representations

职业：基于压缩的视频表示学习方法

基本信息

批准号：
1845485
负责人：
Philipp Kraehenbuehl
金额：
$ 49.75万
依托单位：
University of Texas at Austin
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-06-01 至 2025-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1845485&HistoricalAwards=false
关键词：
CAREER Compression Based Approach Learning

项目摘要

An ever-increasing amount of our digital communication, media consumption, and content creation revolves around videos. We share, watch, and archive many aspects of our lives through them. However, designing and learning representations to understand these videos has proven challenging. Direct extensions of sequence or image-based convolutional neural networks to videos have yielded only moderate success. The goal of this project is to develop efficient, robust, and compact video representations. Every percent increase in the compression rate from this project translates into decreased internet traffic and more storage efficiency, reducing the massive economic and environmental costs of modern digital infrastructure. Any increase in recognition accuracy results in safer autonomous agents, more responsive surveillance and assistive technologies for the elderly, and a deeper understanding of video dynamics in sports and entertainment. Furthermore, this research will translate to the classroom through updated and new undergraduate and graduate-level courses on video recognition and compression.The technical aim of this project is divided into four thrusts. The first thrust develops video recognition models inspired by video compression. The video compression community developed sophisticated, compact and efficient representations for video, used to store the bulk of digital media. The project will study what video compression can teach us about video representations, and how modern codec design can drive the structure of deep video models. The second thrust brings concepts from video recognition back to compression. The interplay between compression and recognition is not a one-way street. The project will investigate how video compression can be learned directly from data, side-stepping many of the manual design choices, and how video compression can learn to be robust to missing or corrupted information. The research team will develop a novel interpretation of video compression as repeated image interpolation. This interpretation opens the door to learned deep video compression algorithms. The third thrust studies the optical representation of motion for both recognition and compression tasks. At the core of both video compression and recognition lies a good representation of motion. The motion fields will be represented in a compact, compressible, temporally consistent, and easy to understand manner. Finally, the fourth thrust finds new supervisory signals, evaluation tasks, and their associated data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

我们越来越多的数字通信、媒体消费和内容创作都围绕着视频展开。我们通过它们分享、观看和存档我们生活的许多方面。然而，事实证明，设计和学习表示法来理解这些视频是具有挑战性的。将基于序列或图像的卷积神经网络直接扩展到视频中只取得了一定的成功。该项目的目标是开发高效、健壮和紧凑的视频表示。该项目的压缩率每增加一个百分点，就会转化为互联网流量的减少和存储效率的提高，从而降低现代数字基础设施的巨大经济和环境成本。识别准确性的任何提高都会导致更安全的自主代理，为老年人提供更灵敏的监控和辅助技术，并更深入地了解体育和娱乐中的视频动态。此外，这项研究将通过更新的和新的本科生和研究生水平的视频识别和压缩课程转化到课堂上。该项目的技术目标分为四个方面。第一个推力是在视频压缩的启发下开发视频识别模型。视频压缩社区开发了复杂、紧凑和高效的视频表示法，用于存储大量数字媒体。该项目将研究视频压缩可以教会我们什么关于视频表示，以及现代编解码器设计如何驱动深度视频模型的结构。第二个推动力将视频识别的概念带回了压缩。压缩和识别之间的相互作用不是单行道。该项目将研究如何直接从数据学习视频压缩，绕过许多手动设计选择，以及视频压缩如何学习对丢失或损坏的信息具有健壮性。研究小组将开发一种新的视频压缩解释为重复图像内插。这种解释为学习深度视频压缩算法打开了大门。第三个推力研究了识别和压缩任务中运动的光学表征。视频压缩和识别的核心是良好的运动表示。运动场将以紧凑、可压缩、在时间上一致且易于理解的方式表示。最后，第四个重点发现了新的监督信号、评估任务和相关数据。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（12）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Long-tail Detection with Effective Class-Margins

DOI：
10.1007/978-3-031-20074-8_40
发表时间：
2023-01
期刊：
ArXiv
影响因子：
0
作者：
Jang Hyun Cho;Philipp Krähenbühl
通讯作者：
Jang Hyun Cho;Philipp Krähenbühl

Multimodal Virtual Point 3D Detection

DOI：
发表时间：
2021-11
期刊：
ArXiv
影响因子：
0
作者：
Tianwei Yin;Xingyi Zhou;Philipp Krähenbühl
通讯作者：
Tianwei Yin;Xingyi Zhou;Philipp Krähenbühl

Learning to drive from a world on rails

DOI：
10.1109/iccv48922.2021.01530
发表时间：
2021-05
期刊：
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
影响因子：
0
作者：
Di Chen;V. Koltun;Philipp Krähenbühl
通讯作者：
Di Chen;V. Koltun;Philipp Krähenbühl

A Multigrid Method for Efficiently Training Video Models

DOI：
10.1109/cvpr42600.2020.00023
发表时间：
2019-12
期刊：
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
影响因子：
0
作者：
Chaoxia Wu;Ross B. Girshick;Kaiming He;Christoph Feichtenhofer;Philipp Krahenbuhl
通讯作者：
Chaoxia Wu;Ross B. Girshick;Kaiming He;Christoph Feichtenhofer;Philipp Krahenbuhl

Long-Term Feature Banks for Detailed Video Understanding

DOI：
10.1109/cvpr.2019.00037
发表时间：
2018-12
期刊：
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
影响因子：
0
作者：
Chao-Yuan Wu;Christoph Feichtenhofer;Haoqi Fan;Kaiming He;Philipp Krähenbühl;Ross B. Girshick
通讯作者：
Chao-Yuan Wu;Christoph Feichtenhofer;Haoqi Fan;Kaiming He;Philipp Krähenbühl;Ross B. Girshick