权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Structural analysis and interactive composition of visual media

视觉媒体的结构分析和交互构成

基本信息

批准号：
EP/J009830/1
负责人：
Ralph Martin
金额：
$ 12.21万
依托单位：
CARDIFF UNIVERSITY
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2012
资助国家：
英国
起止时间：
2012 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FJ009830%2F1
关键词：
Structural analysis interactive composition visual

项目摘要

This project represents joint work between 12 leading Chinese Universities, and several other invited key partners in the UK and US. The Internet, and other large-scale databases, form a significant resource of what may be termed "visual media": images, videos, 3D shape models, and so on. Internet text searches usually produce useful results. However, it can be much more difficult to find visual media, e.g. videos with specific content, or images similar to a picture in one's mind's eye. This is partly due to the fact that most image search is based on text inputs, and partly due to the difficulty of classifying pictures. It is easy for humans to "know" what an image contains, but image understanding by computer requires many tricky tasks - splitting an image into separate objects, and analysing their colour, their shape, and many other attributes. Better solutions to search of visual media would enable many applications in addition to search itself, and we will also look at one of them - the re-use of existing visual media when creating new visual media. This project has four main goals. The first is to investigate new approaches to structural analysis of visual media. This will include devising methods to find salient information (for example, what is the main object? what is irrelevant background? how is this object composed of parts?), and methods which process the information on different scales (small details may be just as important as overall shape, for example). The aim is to come up with hierarchical descriptions of the important information in visual media. The second is to find efficient new approaches to comparing, classifying and searching visual media, based on the above hierarchical descriptions. We will also look at how sketches can be used as a much more powerful means than text of allowing users to describe what they want to find when searching.The third area to be considered is editing and resynthesis of visual media. Structural analysis will provide more meaningful ways to select parts of an image than just, for example, all parts of the scene with a certain colour. In turn, this will simplify the process of editing visual media. Users will be able to apply consistent editing to scene elements with similar meaning (e.g. the user controls bending of one finger, and the computer applies a similar bend to the rest of the fingers of a hand, despite minor shape differences). More powerful search will also allow elements to be rapidly retrieved from visual media databases or the Internet to be combined into new scenes, or to be included within existing images, with suitable adjustment for different lighting, etc. When video is processed, further considerations will be needed to ensure results are consistent over time, and smoothly vary as time progresses; the vast amounts of data involved in video processing make this a challenging problem.The final area of work concerns the use of machine learning techniques to assist with all of the previous goals. The aim here is to automatically learn to recognize complex patterns, permitting software to make intelligent decisions based on visual data. Ultimately, a careful balance must be struck in which the user is firmly in control of the creative process, but the computer makes it easy for the user to produce the desired results.

该项目代表了中国12所领先大学与英国和美国其他几个受邀重要合作伙伴的联合工作。互联网和其他大规模的数据库，形成了一个重要的资源，可以被称为“视觉媒体”：图像，视频，3D形状模型，等等。互联网文本搜索通常会产生有用的结果。然而，要找到视觉媒体可能要困难得多，例如具有特定内容的视频，或者类似于一个人脑海中的图片的图像。这部分是由于大多数图像搜索是基于文本输入的事实，部分是由于分类图片的困难。人类很容易“知道”图像包含什么，但计算机理解图像需要许多棘手的任务-将图像分割为单独的对象，并分析它们的颜色，形状和许多其他属性。更好的视觉媒体搜索解决方案将使许多应用程序除了搜索本身，我们也将看看其中之一-重用现有的视觉媒体时，创造新的视觉媒体。该项目有四个主要目标。第一是研究视觉媒体结构分析的新方法。这将包括设计方法来发现显著信息（例如，主要对象是什么？什么是不相关的背景？这个对象是如何由部件组成的？），以及在不同尺度上处理信息的方法（例如，小细节可能与整体形状一样重要）。其目的是提出视觉媒体中重要信息的层次描述。第二是在上述层次描述的基础上，寻找有效的新方法来比较、分类和搜索视觉媒体。我们还将研究草图如何作为一种比文本更强大的手段，允许用户在搜索时描述他们想要找到的内容。第三个要考虑的领域是视觉媒体的编辑和再合成。结构分析将提供更有意义的方法来选择图像的部分，而不仅仅是，例如，具有某种颜色的场景的所有部分。反过来，这将简化编辑视觉媒体的过程。用户将能够对具有类似含义的场景元素应用一致的编辑（例如，用户控制一个手指的弯曲，并且计算机对手的其余手指应用类似的弯曲，尽管有微小的形状差异）。更强大的搜索功能还将允许从视觉媒体数据库或互联网中快速检索元素，以组合成新的场景，或包含在现有图像中，并针对不同的照明进行适当调整等。视频处理中涉及的大量数据使这成为一个具有挑战性的问题。2最后一个工作领域涉及使用机器学习技术来帮助实现所有先前的目标。这里的目标是自动学习识别复杂的模式，允许软件根据视觉数据做出智能决策。最终，必须达到一种谨慎的平衡，即用户牢牢控制创作过程，但计算机使用户更容易产生所需的结果。