权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Incisive Tagging: Humans-in-the-Loop in Selection and Labelling of Remote Sensing Data Sets

深入标记：遥感数据集选择和标记中的人在环

基本信息

批准号：
2440657
负责人：
金额：
--
依托单位：
Swansea University
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2440657
关键词：
Incisive Tagging Humans Loop Selection

项目摘要

We are developing a pre-trained deep neural network to function 'under the hood' of multiple solutions to extract geospatial information from remote sensing imagery. So far, better results are achieved with larger data sets. However, we observe that much of the imagery contains little apparent unique information and are interested in developing a way to select only the pivotal examples for training. Further, we are keen to work more effectively with our in-house image interpretation experts - using their experience and specific abilities in ways that are rewarding and motivating. Within our work, image interpreters may be labelling data for training networks - and hence may be best deployed labelling the most pivotal examples. Another application is to present image interpreters with multiple examples of image clips that highly activate specific parts of the network and ask them to provide their own interpretation of the representations learned by the deep network. Our questions (not all of which may be addressed in this PhD): Can we improve sample efficiency? Even with an unsupervised target Can we actively label data? Presenting humans with the most pivotal examples for labelling Can we make labelling enjoyable? Using our experts most effectively?Aims and Intended ImpactMore efficient training and updating of machine learning models with remote sensing dataMore efficient and rewarding labelling of training examplesMore human-interpretable neural networksThis project will be grounded in investigating the interplay of human creativity, intelligence and fulfilment with efficient ML tools. The work will begin and iterate around deep and intensive understandings of the labellers' perspectives of the task; leading to prototypes and evaluations. These prototypes might involve novel gamification (e.g. [1]); visualisation techniques; or even the use of multiple modalities - e.g. from simple gestures to emotional state recognition [2] - to provide input to ML tools. The interaction design of the labelling tool will inform and be informed by algorithmic innovations within the ML tool. For instance: One approach to making the task more efficient, immersive and less onerous would be to make spotting of pivotal examples easier. So, for example, we can present samples clustered on similarity as groups of thumbnails, allowing the labeller to spot outliers faster. Another strategy might be to use active labelling - i.e. given a small labelled data set can we present larger sets to users and gain their feedback to (a) label large amounts of data quicklyand (b) resultingly make the labelling task less onerous. We might also consider how to improve sample efficiency - that is, reducing the number of samples required without reducing the efficacy of the approach. Some theoreticalmodels on sample complexity have been investigated [3]. Monte Carlo techniques would be a suggested research direction. For example, Importance Sampling has long been studied in Path Tracing and more advanced techniques such as Hamiltonian Monte Carlo or Gradient Domain [4] demonstrate orders of magnitude performance gains through a great reduction in required samples. Ensembles are used to increase robustness and stability which lend well to importance sampling. We could also examine the literature on robust statistics and M-estimators as methods for drawing samples (see [5] for a recent review). In these sorts of investigation, we will draw on the labelers' experience and insights to supplement any quantitative or theoretical evaluations of the power or limitations of the proposed approaches. The human-centered improvements discussed above could also drive machine performance improvement. Models suffer from needing long training times. There is a potential that the current problem size can be compressed so it just fits into GPU memory toimprove cache coherency during training

我们正在开发一个预先训练的深度神经网络，用于从遥感图像中提取地理空间信息的多种解决方案的“引擎盖下”。到目前为止，更大的数据集取得了更好的结果。然而，我们观察到，许多图像包含很少明显的独特信息，并有兴趣开发一种方法来选择只训练的关键例子。此外，我们渴望与我们的内部图像判读专家更有效地合作-以奖励和激励的方式利用他们的经验和特殊能力。在我们的工作中，图像解释器可能会为训练网络标记数据-因此可能最适合标记最关键的示例。另一个应用是向图像解释器提供多个高度激活网络特定部分的图像剪辑示例，并要求他们提供自己对深度网络学习到的表征的解释。我们的问题（不是所有的问题都可以在这个博士学位中解决）：我们可以提高样品效率吗？即使是无监督的目标，我们也能主动标记数据吗？为人类提供最关键的标签例子我们能让标签变得有趣吗？最有效地利用我们的专家？目标和预期影响利用遥感数据更有效地训练和更新机器学习模型更有效和有益的训练示例标签更多可供人类解释的神经网络该项目将以研究人类创造力，智能和实现与高效ML工具的相互作用为基础。这项工作将开始，并围绕着深入和密集的理解标签的观点的任务，导致原型和评估。这些原型可能涉及新颖的游戏化（例如[1]）;可视化技术;甚至使用多种形式-例如从简单的手势到情绪状态识别[2] -为ML工具提供输入。标签工具的交互设计将为ML工具中的算法创新提供信息和信息。例如：一种使任务更有效，更身临其境，更轻松的方法是更容易发现关键的例子。因此，例如，我们可以将按相似性聚类的样本表示为缩略图组，从而使标注器能够更快地发现离群值。另一种策略可能是使用主动标记-即给定一个小的标记数据集，我们可以向用户呈现更大的数据集并获得他们的反馈，以（a）快速标记大量数据（B）从而使标记任务不那么繁重。我们还可以考虑如何提高样本效率--也就是说，在不降低方法有效性的情况下减少所需的样本数量。一些关于样本复杂性的理论模型已经被研究[3]。蒙特卡洛技术将是一个建议的研究方向。例如，重要性采样在路径跟踪中已经研究了很长时间，更先进的技术，如汉密尔顿蒙特卡罗或梯度域[4]，通过大幅减少所需的样本，展示了数量级的性能增益。集成被用来提高鲁棒性和稳定性，这很好地帮助重要性采样。我们还可以研究关于稳健统计和M-估计作为抽取样本方法的文献（最近的综述见[5]）。在这类调查中，我们将借鉴标签人员的经验和见解，以补充对所提出方法的能力或局限性的任何定量或理论评估。上面讨论的以人为本的改进也可以推动机器性能的改进。模型需要很长的训练时间。当前问题的大小可能会被压缩，使其正好适合GPU内存，从而提高训练期间的缓存一致性