权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Learning reliable representations when proxy objectives fail

当代理目标失败时学习可靠的表示

基本信息

批准号：
2665673
负责人：
金额：
--
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2017
资助国家：
英国
起止时间：
2017 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2665673
关键词：
Learning reliable representations when proxy

项目摘要

Representation learning involves using an objective to learn a mapping from data space to a representation space. When the downstream task for which a mapping must be learned is unknown, or is too costly to cast as an objective, we must rely on proxy objectives for learning. In this Thesis I focus on representation learning for images, and address three cases where proxy objectives fail to produce a mapping that performs well on the downstream tasks. When learning neural network mappings from image space to a discrete hash space for content-based image retrieval, a proxy objective is needed which captures the requirement for relevant responses to be nearer to the hash of any query than irrelevant ones. At the same time, it is important to ensure an even distribution of image hashes across the whole hash space for efficient information use and high discrimination. Proxy objectives fail when they do not meet these requirements. I propose using a standard classifier to predict class labels and convert these to a binary representation for state-of-the-art performance on the image retrieval task. I also propose a binary deep decision tree layer (DDTL) to model further intra-class differences and produce approximately evenly distributed hash codes. The DDTL requires no discretisation during learning and produces hash codes that enable better discrimination between data in the same class when compared to previous methods, while remaining robust to real-world augmentations in the data space. In the scenario where we require a neural network to partition the data into clusters that correspond well with ground-truth labels, a proxy objective is needed to define how these clusters are formed. One such proxy objectives involves maximising the mutual information between cluster assignments made by a neural network from multiple views. In this context views are different augmentations of the same image and the cluster assignments are the representations computed by a neural network. I demonstrate that this proxy objective produces parameters for the neural network that are sub-optimal in that a better set of parameters can be found using the same objective and a different training method. I introduce deep hierarchical object grouping (DHOG) as a method to learn a hierarchy (in the sense of easy-to-hard orderings, not structure) of solutions to the proxy objective and show how this improves performance on the downstream task. When there are features in the training data from which it is easier to compute class predictions (e.g., background colour), when compared to features for which it is relatively more difficult to compute class predictions (e.g., digit type), standard classification objectives (e.g., cross-entropy) fail to produce robust classifiers. The problem is that if a model learns to rely on `easy' features it will also ignore `complex' features (easy versus complex are purely relative in this case). I introduce latent adversarial debiasing (LAD) to decouple easy features from the class labels by first modelling the underlying structure of the training data as a latent representation using a vector-quantised variational autoencoder, and then I use a gradient-based procedure to adjust the features in this representation to confuse the predictions of a constrained classifier trained to predict class labels from the same representation. The adjusted representations of the data are then decoded to produce an augmented training dataset that can be used for training in a standard manner. I show in the aforementioned scenarios that proxy objectives can fail and demonstrate that alternative approaches can mitigate against the associated failures. I suggest an analytic approach to understanding the limits of proxy objectives for every use case in order to make the adjustments to the data or the objectives and ensure good performance on downstream tasks.

表示学习涉及使用目标来学习从数据空间到表示空间的映射。当必须学习映射的下游任务是未知的，或者成本太高而不能作为目标时，我们必须依赖代理目标进行学习。在这篇论文中，我专注于图像的表示学习，并解决三种情况下，代理目标无法产生一个映射，在下游任务上表现良好。当学习神经网络从图像空间到离散散列空间的映射以进行基于内容的图像检索时，需要代理目标，该代理目标捕获相关响应比不相关响应更接近任何查询的散列的要求。同时，重要的是要确保图像哈希在整个哈希空间中的均匀分布，以实现有效的信息使用和高分辨率。当代理目标不满足这些要求时，它们将失败。我建议使用一个标准的分类器来预测类标签，并将其转换为二进制表示，以实现图像检索任务的最佳性能。我还提出了一个二进制深度决策树层（DDTL）来模拟进一步的类内差异，并产生近似均匀分布的哈希码。DDTL在学习过程中不需要离散化，并产生哈希码，与以前的方法相比，能够更好地区分同一类中的数据，同时对数据空间中的真实世界增强保持鲁棒性。在我们需要神经网络将数据划分为与地面真实标签对应的聚类的情况下，需要一个代理目标来定义这些聚类是如何形成的。一个这样的代理目标涉及最大化由神经网络从多个视图进行的聚类分配之间的互信息。在这种情况下，视图是同一图像的不同增强，而聚类分配是由神经网络计算的表示。我证明了这个代理目标产生的神经网络参数是次优的，因为使用相同的目标和不同的训练方法可以找到一组更好的参数。我介绍了深层次对象分组（DHOG）作为一种方法来学习层次结构（从简单到困难的排序，而不是结构）的解决方案的代理目标，并显示如何提高性能的下游任务。当训练数据中存在更容易计算类别预测的特征时（例如，背景颜色），当与相对更难以计算类别预测的特征（例如，数字类型），标准分类目标（例如，交叉熵）不能产生鲁棒的分类器。问题是，如果一个模型学会依赖“简单”的特征，它也会忽略“复杂”的特征（在这种情况下，简单与复杂纯粹是相对的）。我引入了潜在对抗性去偏（LAD），通过首先使用矢量量化变分自编码器将训练数据的底层结构建模为潜在表示来将简单特征与类标签解耦，然后我使用基于梯度的过程来调整此表示中的特征，以混淆训练用于从相同表示预测类标签的约束分类器的预测。然后，对数据的调整后的表示进行解码，以产生可以以标准方式用于训练的增强训练数据集。我在前面提到的场景中展示了代理目标可能失败，并证明了替代方法可以减轻相关的失败。我建议使用一种分析方法来理解每个用例的代理目标的限制，以便对数据或目标进行调整，并确保下游任务的良好性能。