权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Detecting Training Abuses in Neural Nets

检测神经网络中的训练滥用

基本信息

批准号：
2301656
负责人：
金额：
--
依托单位：
University of Sheffield
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2301656
关键词：
Detecting Training Abuses Neural Nets

项目摘要

Many military systems carry out classification tasks. For example, a system might be required to distinguish between an allied tank and an enemy tank (a classic problem in machine learning). Modern machine learning approaches are being brought to bear on classification problems within the military domain and more widely. Neural networks, a technology that, loosely speaking, makes decisions in a manner analogous to the way the human brain works, play a particularly prominent role. Most work proceeds on the assumption that all is benign. But imagine if an enemy wanted to cause your classifier to work well, except when presented with a very specific classification task. For example, an enemy tank with a particular appearance could be classified as an allied one, with very significant consequences. Can an enemy engineer such behaviour? In certain circumstances, yes! It depends on how and by whom the classifier system was built. The building of such systems is often outsourced in some way, e.g. because the procurer lacks the computational capability to craft an effective system or by the use of publicly generated components. We often refer to hidden malicious functionality that can be invoked when convenient as a 'trapdoor'. Trapdoors are often very difficult to detect. Imagine a system that classified perfectly thousands of tank examples provided by you. It seems like this is a very good system. But the system may have been trained so that an enemy tank with "666" painted on its side is misclassified. If you don't know this specific condition you would have little reason to generate a test example to discover it. Neural networks are also notoriously opaque in rendering apparent how they make decisions and this makes this sort of trapdoor detection particularly hard. We might reasonably ask whether or how well we can detect such trapdoors. There are various levels at which understanding may be sought. Thus, determining whether a system has a trapdoor in it (yes/no) is a simpler and less ambitious task than seeking the specific trapdoor condition (the "666' indicated above). Though there is a fair amount on trapdoors in the literature, typically addressing issues of planting or detecting trapdoors, there appears to be little concerned with characterising them. It would seem clear that any detection technique is likely to be more successful on some trapdoors than on others. This raises the question, however, as to how to describe those where the technique works well and those where it performs less well. A rigorous approach to detection, the primary goal of this project, requires a nuanced understanding of trapdoors. In particular, a characterisation of trapdoors together with measurements of their properties, e.g. how much a trapdoor example deviates from a normal example, is essential. If trapdoor generation is now considered, the characterisation of trapdoors allows more refined specification of properties we would like an inserted trapdoor to have. This serves two purposes: firstly, it facilitates a more nuanced generational capability for practical operational purposes, i.e. for someone who wants to benefit from planting a trapdoor in the real world; and secondly, it allows researchers (initially ourselves!) to generate sets of trapdoors for rigorous evaluation of detection techniques. We can define what it means to 'cover' the trapdoor space in some way, much as we cover input or other space in general testing. Since there is no extant workable characterisation of trapdoors there is also clearly no extant generational capability.

许多军事系统执行分类任务。例如，可能需要一个系统来区分盟军坦克和敌方坦克（机器学习中的经典问题）。现代机器学习方法正在应用于军事领域乃至更广泛的分类问题。神经网络是一种以类似于人脑工作方式做出决策的技术，它发挥着尤为突出的作用。大多数工作都是在一切都是良性的假设下进行的。但想象一下，如果敌人想让你的分类器正常工作，除非面临非常具体的分类任务。例如，具有特定外观的敌方坦克可以被归类为盟军坦克，从而产生非常严重的后果。敌人可以策划这样的行为吗？在某些情况下，是的！这取决于分类器系统的构建方式和由谁构建。此类系统的构建通常以某种方式外包，例如因为采购者缺乏构建有效系统或使用公共生成组件的计算能力。我们经常将方便时调用的隐藏恶意功能称为“活板门”。活板门通常很难被发现。想象一下，一个系统可以对您提供的数千个坦克示例进行完美分类。看起来这是一个非常好的系统。但该系统可能经过训练，导致侧面涂有“666”的敌方坦克被错误分类。如果您不知道这个特定条件，您就没有理由生成测试示例来发现它。神经网络在渲染决策方式方面也是出了名的不透明，这使得这种活板门检测特别困难。我们可能会合理地问我们是否能够检测到此类活板门，或者检测到的程度如何。可以在多个层面上寻求理解。因此，确定系统中是否有活板门（是/否）比寻找特定活板门条件（上面指出的“666”）是一项更简单、更不那么雄心勃勃的任务。尽管文献中有大量关于活板门的内容，通常解决植入或检测活板门的问题，但似乎很少关注表征它们。很明显，任何检测技术都可能在某些活板门上更成功。比其他人。然而，这就提出了一个问题，即如何描述该技术效果良好的部分和效果较差的部分。严格的检测方法是该项目的主要目标，需要对活板门有细致入微的了解。特别是活板门的表征及其属性的测量，例如活板门示例与正常示例的偏离程度至关重要。如果现在考虑陷门生成，则特征活板门允许对我们希望插入的活板门具有的属性进行更精细的规范。这有两个目的：首先，它有助于为实际操作目的提供更细致的生成能力，即对于那些想要从现实世界中植入活板门中受益的人来说；其次，它允许研究人员（最初是我们自己！）生成一组活板门，以对检测技术进行严格评估。我们可以以某种方式定义“覆盖”活板门空间的含义，就像我们在一般测试中覆盖输入或其他空间一样。由于活板门没有现存的可行的特征，因此显然也没有现存的生成能力。