权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Using Video Information Retrieval (VIR) for "decoding" video-based content

使用视频信息检索 (VIR) 来“解码”基于视频的内容

基本信息

批准号：
2604208
负责人：
金额：
--
依托单位：
Queen Mary University of London
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2604208
关键词：
Using Video Information Retrieval VIR

项目摘要

Media broadcasting has been in very high demand during the last decade or so. Even before the pandemic, people used to spend quite a lot of time watching their favourite programs. TV broadcasters and VideoOnDemand (VoD) Platforms have increased their content dramatically, in order to meet viewers' expectations.At Channel4, one of our strategic pillars is to have the best possible understanding of our customers' behaviour. Viewers' expectations have changed dramatically, and we need to put viewers' motivations at the heart of our decision-making (Future4, 2021). To better understand who our customers are, we need to focus on their interactions with our content. As a result, a deep understanding of our content is a prerequisite to decoding viewers' preferences.Music Information Retrieval (MIR) has been researched extensively in the past. Music has facets that "are not just to be found within the music itself, in the form of melody, harmony, tempo, and timbre, but are interpreted by the cognitive processes of the listener within frameworks of culturally agreed rules, such as genre, style, and mood" (Inksip, 2011). On the other hand, there is very limited research around retrieving information from video-based content. This research proposal focuses on a comparable Content-based Information Retrieval principle, however, the content will not be music, but video.The whole broadcasting community can benefit from this work. VoD Platforms and TV Broadcasters should let their viewers play a key role within their decision-making systems, and not only to receive recommendations and communications based on demographics. Viewers' interactions with video-based content can become the richest source of information, if Machine Learning techniques retrieve insights, by checking the enormous amounts of data and metadata that are generated by the millions of viewings on a daily basis. Lee et al (2016) highlighted some potential improvements for a model's accuracy, in order to predict the success of a movie. Taking perspectives such as introducing "unexplored features", can elevate the accuracy of the model (Lee et al, 2016).For this project, I will focus on three main roadmaps. Firstly, I am proposing a segmentation of video-based content on micro-genres. Besides the fact that the very broad genres are still useful, Channel4 needs to follow an approach similar to Netflix, which has come up with over 76,000 micro-genres (Janelle K, 2020). I consider it a very important task of every Broadcaster, either linear or OnDemand, to segment their content efficiently, in order to feed their Machine Learning models with features that will help them improve their accuracy. I am proposing the use of Computer Vision and cutting-edge Machine Learning in order to define micro-genres that describe Channel4's video-based content.Secondly, Information Retrieval can be achieved by applying Machine Learning to the scripts and the dialogues of the content, and the video itself. I am proposing the use of Natural Language Processing and Computer Vision, in order to extract the real feelings that a video is passing to its viewers. Sentiment analysis of scripts, dialogues, and videos can be a very useful source of information, and might reveal some extra aspects for the content that will boost a Machine Learning model with extra power.Thirdly, I need to identify the different viewing modes that users watch for video-based content. By applying cutting-edge Machine Learning algorithms, I am suggesting a segmentation of the viewings whether they are broadcasted during a commute, or after work, or maybe during a lunch break. The challenge is to manage and pre-process all the metadata that users leave after their viewings. Advanced coding should be applied in order to transform all this amount of unstructured data, into dimensions that will help a Machine Learning model generate valuable insights.

在过去十年左右的时间里，媒体广播的需求量很大。即使在大流行之前，人们过去也会花很多时间观看他们最喜欢的节目。为了满足观众的期望，电视广播公司和视频点播（VoD）平台大幅增加了他们的内容。我们的战略支柱之一是尽可能了解客户的行为。观众的期望发生了巨大变化，我们需要将观众的动机置于我们决策的核心（Future4，2021）。为了更好地了解我们的客户是谁，我们需要关注他们与我们内容的互动。因此，深入了解我们的内容是解码观众偏好的先决条件。音乐信息检索（MIR）在过去已经被广泛研究。音乐有多个方面，“不仅可以在音乐本身中找到，以旋律，和声，克里思和音色的形式，而且可以在文化认可的规则框架内由听众的认知过程进行解释，例如流派，风格和情绪“（Inksip，2011）。另一方面，从基于视频的内容中检索信息的研究非常有限。这项研究计划的重点是一个类似的基于内容的信息检索原则，但内容将不是音乐，而是视频。整个广播界可以受益于这项工作。VoD平台和电视广播公司应该让他们的观众在他们的决策系统中发挥关键作用，而不仅仅是接收基于人口统计的建议和通信。如果机器学习技术通过检查每天数百万次观看产生的大量数据和元数据来检索见解，观众与基于视频的内容的交互可以成为最丰富的信息来源。Lee et al（2016）强调了模型准确性的一些潜在改进，以预测电影的成功。引入“未开发的功能”等观点可以提高模型的准确性（Lee et al，2016）。对于这个项目，我将专注于三个主要路线图。首先，我提出了一个基于视频内容的细分微流派。除了非常广泛的类型仍然有用的事实之外，Xuel4需要遵循类似于Netflix的方法，Netflix已经提出了超过76，000种微类型（Janelle K，2020）。我认为这是每个广播公司的一项非常重要的任务，无论是线性的还是点播的，都要有效地分割他们的内容，以便为他们的机器学习模型提供有助于提高准确性的功能。我建议使用计算机视觉和尖端的机器学习来定义描述基于视频的内容的微类型。其次，可以通过将机器学习应用于内容的脚本和对话以及视频本身来实现信息检索。我建议使用自然语言处理和计算机视觉，以提取视频传递给观众的真实的感受。对脚本、对话和视频的情感分析可能是非常有用的信息来源，可能会揭示内容的一些额外方面，从而为机器学习模型提供额外的功能。第三，我需要识别用户观看基于视频的内容的不同观看模式。通过应用最先进的机器学习算法，我建议对观看进行分段，无论是在通勤期间，还是在下班后，或者在午休期间。面临的挑战是管理和预处理用户在查看后留下的所有元数据。应该应用高级编码，以便将所有这些非结构化数据转换为有助于机器学习模型生成有价值见解的维度。