ComPLetely Unsupervised Multimodal Character identification On TV series and movies
电视剧和电影中完全无监督的多模态角色识别
基本信息
- 批准号:316692988
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2016
- 资助国家:德国
- 起止时间:2015-12-31 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Automatic character identification in multimedia videos is an extensive and challenging problem. Person identification serves as foundation and building block for many higher level video analysis tasks, for example semantic indexing, search and retrieval, interaction analysis and video summarization.The goal of this project is to exploit textual, audio and video information to automatically identify characters in TV series and movies without requiring any manual annotation for training character models. A fully automatic and unsupervised approach is especially appealing when considering the huge amount and growth of available multimedia data. Text, audio and video provide complementary cues to the identity of a person, and thus allow to better identify a person than from either modality alone.To this end, we will address three main research questions: unsupervised clustering of speech turns (i.e. speaker diarization) and face tracks in order to group similar tracks of the same person without prior labels or models; unsupervised identification by propagation of automatically generated weak labels from various sources of information (such as subtitles and speech transcripts); and multimodal fusion of acoustic, visual and textual cues at various levels of the identification pipeline.While there exist many generic approaches to unsupervised clustering, they are not adapted to heterogeneous audiovisual data (face tracks vs. speech turns) and do not perform as well on challenging TV series and movies content as they do on other controlled data. Our general approach is therefore to first over-cluster the data and make sure that clusters remain pure, before assigning names to these clusters in a second step. On top of domain specific improvements for either modality alone, we expect joint multimodal clustering to take advantage of three modalities and improve clustering performance over each modality alone.Then, unsupervised identification aims at assigning character names to clusters in a completely automatic manner (i.e. using only available information already present in the speech and video). In TV series and movies, character names are usually introduced and re-iterated throughout the video. We will detect and use addresser-addressee relationships in both speech transcripts (using named entity detection techniques) and video (using mouth movements, viewing direction and focus of attention of faces). This allows to assign names to some clusters, learn discriminative models and assign names to the remaining clusters.For evaluation, we will extend and further annotate a corpus of four TV series (57 episodes) and one movie series (8 movies), a total of about 50 hours of video. This diverse data covers different filming styles, type of stories and challenges contained in both video and audio. We will evaluate the different steps of this project on this corpus, and also make our annotations publicly available for other researchers in the field.
多媒体视频中的字符自动识别是一个广泛而具有挑战性的问题。人物识别是许多更高层次的视频分析任务的基础和构建块,例如语义索引,搜索和检索,交互分析和视频摘要。本项目的目标是利用文本,音频和视频信息自动识别电视剧和电影中的人物,而无需任何人工注释来训练人物模型。当考虑到可用多媒体数据的巨大数量和增长时,全自动和无监督的方法特别有吸引力。文本、音频和视频为人的身份提供了互补的线索,因此比单独使用任何一种模态都能更好地识别人。为此,我们将解决三个主要的研究问题:(即,说话人日志化)和面部轨迹,以便在没有先前标签或模型的情况下对同一个人的相似轨迹进行分组;通过传播来自各种信息源的自动生成的弱标签的无监督识别(如字幕和演讲稿);以及声学的多模态融合,视觉和文本线索。虽然存在许多通用的无监督聚类方法,它们不适合于不同的视听数据(面部轨迹对语音转向),并且在有挑战性的电视连续剧和电影内容上的表现不如它们在其他受控数据上的表现。因此,我们的一般方法是首先对数据进行过度聚类,并确保聚类保持纯净,然后在第二步中为这些聚类分配名称。除了对每一种模态进行特定领域的改进之外,我们还希望联合多模态聚类能够利用三种模态,并提高每一种模态的聚类性能。然后,无监督识别的目标是以完全自动的方式(即只使用语音和视频中已经存在的可用信息)将字符名称分配给聚类。在电视剧和电影中,角色的名字通常会在整个视频中介绍和重复。我们将检测和使用两个语音成绩单(使用命名实体检测技术)和视频(使用嘴部动作,观看方向和面部的注意力集中)的收件人-收件人的关系。这允许为一些聚类分配名称,学习判别模型并为其余聚类分配名称。为了评估,我们将扩展并进一步注释四部电视剧(57集)和一部电影系列(8部电影)的语料库,总共约50小时的视频。这些不同的数据涵盖了不同的拍摄风格,故事类型以及视频和音频中包含的挑战。我们将在这个语料库上评估这个项目的不同步骤,并将我们的注释公开提供给该领域的其他研究人员。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Classification-Driven Dynamic Image Enhancement
- DOI:10.1109/cvpr.2018.00424
- 发表时间:2017-10
- 期刊:
- 影响因子:0
- 作者:Vivek Sharma;Ali Diba;D. Neven;M. S. Brown;L. Gool;R. Stiefelhagen
- 通讯作者:Vivek Sharma;Ali Diba;D. Neven;M. S. Brown;L. Gool;R. Stiefelhagen
Self-supervised Face-Grouping on Graphs
- DOI:10.1145/3343031.3351071
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Veith Röthlingshöfer;Vivek Sharma;R. Stiefelhagen
- 通讯作者:Veith Röthlingshöfer;Vivek Sharma;R. Stiefelhagen
Clustering based Contrastive Learning for Improving Face Representations
- DOI:10.1109/fg47880.2020.00011
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
- 通讯作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
Video Face Clustering With Self-Supervised Representation Learning
- DOI:10.1109/tbiom.2019.2947264
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
- 通讯作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
Self-Supervised Learning of Face Representations for Video Face Clustering
- DOI:10.1109/fg.2019.8756609
- 发表时间:2019-03
- 期刊:
- 影响因子:0
- 作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
- 通讯作者:Vivek Sharma;Makarand Tapaswi;M. Sarfraz;R. Stiefelhagen
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr.-Ing. Rainer Stiefelhagen其他文献
Professor Dr.-Ing. Rainer Stiefelhagen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr.-Ing. Rainer Stiefelhagen', 18)}}的其他基金
Automatic Alignment of Textto-Video for Semantic Multimedia Analysis
用于语义多媒体分析的文本到视频的自动对齐
- 批准号:
252286362 - 财政年份:2014
- 资助金额:
-- - 项目类别:
Research Grants
相似海外基金
Data-driven phenotyping of central disorders of hypersomnolence with unsupervised clustering: toward more reliable diagnostic criteria
无监督聚类的数据驱动的中枢性嗜睡症表型分析:寻求更可靠的诊断标准
- 批准号:
481046 - 财政年份:2023
- 资助金额:
-- - 项目类别:
CRCNS Research Proposal: A Unified Framework for Unsupervised Sparse-to-dense Brain Image Generation and Neural Circuit Reconstruction
CRCNS 研究提案:无监督稀疏到密集脑图像生成和神经回路重建的统一框架
- 批准号:
2309073 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Continuing Grant
FRR: Collaborative Research: Unsupervised Active Learning for Aquatic Robot Perception and Control
FRR:协作研究:用于水生机器人感知和控制的无监督主动学习
- 批准号:
2237577 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
- 批准号:
2237640 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Continuing Grant
Knockoff Feature Selection Techniques for Robust Inference in Supervised and Unsupervised Learning
监督和无监督学习中鲁棒推理的仿冒特征选择技术
- 批准号:
2310955 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Unsupervised machine learning classification of ADHD subtype by urinary levels of tryptophan and monoamine neurotransmitters
根据尿色氨酸和单胺神经递质水平对 ADHD 亚型进行无监督机器学习分类
- 批准号:
23K12814 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Early-Career Scientists
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
- 批准号:
23KJ0565 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for JSPS Fellows
Unsupervised Annotation of Complex 3D BioMedical Data.
复杂 3D 生物医学数据的无监督注释。
- 批准号:
2882348 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship
Using synthetic data and unsupervised learning methods for malware detection
使用合成数据和无监督学习方法进行恶意软件检测
- 批准号:
10076857 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative R&D