权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Disentangling generalization in deep neural networks

解开深度神经网络中的泛化问题

基本信息

批准号：
2872706
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2872706
关键词：
Disentangling generalization deep neural networks

项目摘要

Despite the excellent empirical performance of deep artificial neural networks (ANNs) on a wide variety of tasks, including in vision and language modelling, the underlying principles that allow over-parameterised ANNs to both memorise training data and generalise to unseen data are not well understood. A standard result that infinite-width ANNs are universal function approximators renders many standard measures of generalisation ability uninformative in the case of ANNs. The inability to measure generalization performance (except on some subset of the data), or guarantee that the training data contains all possible examples of a phenomenon, makes the increased deployment of ANNs in safety-critical areas, such as finance and healthcare, risky. This project will attempt to analyse the generalisation performance of ANNs by disentangling the effects of the training data, the model architecture, and the training algorithm. The project will also attempt to develop new measures of generalisation performance that are more informative than existing measures. The project will be supervised by Prof. Varun Kanade.Machine learning researchers will often refer to the "inductive bias" of a model to explain why the trained model expresses a function that generalises well, rather than another function that fits the training data but does not generalise. However, while some inductive biases are easy to define and understand, such as translation invariance in convolutional neural networks, others are more difficult, such as the apparent simplicity bias of simple feedforward architectures. Previous studies have tended to focus on only one aspect of the inductive bias of a model, such as the architecture, while holding other factors constant, making it hard to determine the relative importance of each factor on generalisation. This project will involve mathematical analysis of a number of architectures and optimisation methods, on several different types of data, to determine the relative contribution of each of these factors to the generalisation performance of the trained model. This will lead to the development of theoretically motivated generalisation bounds that can provide guidance and guarantees on when deployed models will perform as expected, and when they may fail. The successful completion of this project could have a significant impact on increasing trust in ANNs, and ensuring they are deployed safely.This project falls within the EPSRC artificial intelligence technologies and theoretical computer science research areas.

尽管深度人工神经网络（ANN）在包括视觉和语言建模在内的各种任务上具有出色的经验表现，但允许过参数化ANN记住训练数据并推广到看不见的数据的基本原理还没有得到很好的理解。一个标准的结果，无限宽度的人工神经网络是通用的函数逼近器，使许多标准的措施，泛化能力的情况下，人工神经网络的信息。无法测量泛化性能（除了数据的某些子集），或保证训练数据包含现象的所有可能示例，使得在金融和医疗保健等安全关键领域增加部署人工神经网络具有风险。这个项目将试图通过分析训练数据，模型架构和训练算法的影响来分析ANN的泛化性能。该项目还将尝试开发比现有措施更具信息性的新的泛化性能措施。机器学习研究人员经常会提到模型的“归纳偏差”，以解释为什么训练后的模型表达了一个泛化能力很好的函数，而不是另一个拟合训练数据但不能泛化的函数。然而，虽然一些归纳偏差很容易定义和理解，例如卷积神经网络中的平移不变性，但其他一些则比较困难，例如简单前馈架构的明显简单性偏差。以前的研究往往只关注模型的归纳偏差的一个方面，如架构，而保持其他因素不变，使得很难确定每个因素对概括的相对重要性。该项目将涉及对几种不同类型的数据进行数学分析，以确定这些因素中的每一个对训练模型的泛化性能的相对贡献。这将导致理论上有动机的泛化界限的发展，可以提供指导和保证，当部署的模型将按预期执行时，当它们可能失败时。该项目的成功完成可能会对增加对人工神经网络的信任产生重大影响，并确保它们安全部署。该项目属于EPSRC人工智能技术和理论计算机科学研究领域的福尔斯。