权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Training Unstructured Sparse Neural Networks

训练非结构化稀疏神经网络

基本信息

批准号：
RGPIN-2022-03120
负责人：
Ioannou, Yani
金额：
$ 1.82万
依托单位：
University of Calgary
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=752363
关键词：
Training Unstructured Sparse Neural Networks

项目摘要

The Increasing Cost of Training Deep Neural Networks Deep Neural Networks (DNNs) are behind the intelligence found in contemporary technology, enabling us to search our photos, ask questions of smart assistants or use machine translation to understand a foreign language. DNNs have a fundamental problem however: they are very expensive - in cost, energy usage, and time, both during training (learning a task) and inference (application). The state-of-the-art model for Natural Language Processing (NLP), GPT-3, has 175 billion parameters and is estimated to cost more than $4.6M USD to train. The current trend is that training DNNs will only get ever more expensive as the growth in the cost of training state-of-the-art DNN models has surpassed even the exponential growth in transistor technology that governs our computational capacity, i.e. Moore's Law. Proposed Program of Research Enable Sparse DNN Training DNNs are primarily expensive due to the empirically established, but poorly understood, requirement to over-parameterize DNNs models during training for good generalization (performance on unseen data). We know these models are over-parameterized because, after training, 80-95% of the learned weights (parameters) of DNN models can be removed (pruned) without significant loss of generalization [16]. Unstructured pruning, that is removing unnecessary individual weights from a DNN, can be highly effective at reducing the size and efficiency of pre-trained DNNs used in applications. However, attempting to train unstructured sparse DNN from random initialization - just as dense (standard) DNNs are trained - rarely achieves similar generalization as dense training empirically. For this reason, unstructured sparsity has not played a significant role in decreasing the cost of training DNNs as of yet. Developing effective approaches for efficient training would make DNN training cheaper, faster, more repeatable, and importantly more accessible to both researchers and new applications. Efficient Deep Learning and Society: Adversarial Robustness and Bias of Efficient DNNs Already DNNs used in industrial application are drastically different from those proposed in most academic papers, in using efficient DNNs methods (e.g. quantization, pruning, and distillation) and efficient DNN architectures - and yet, there is little work exploring the differences between the well-studied academic models, and industry-applied efficient DNNs beyond simply generalization performance and efficiency. Understanding the differences between efficient DNNs and the typical DNNs seen in academic research is becoming imperative given the increasing reliance of our society on this technology. With an ever larger potential to affect our society, identifying and resolving any issues with robustness and bias specific to efficient DNNs is an increasingly important area of research, and one relevant to a variety of potential industrial partners using DNNs in real-world applications.

深度神经网络（DNN）是当代技术中发现的智能的背后，使我们能够搜索我们的照片，向智能助手提问或使用机器翻译来理解外语。然而，DNN有一个根本问题：它们非常昂贵-在训练（学习任务）和推理（应用）期间的成本，能源使用和时间。最先进的自然语言处理（NLP）模型GPT-3有1750亿个参数，估计训练成本超过460万美元。目前的趋势是，训练DNN只会变得越来越昂贵，因为训练最先进的DNN模型的成本增长甚至超过了控制我们计算能力的晶体管技术的指数增长，即摩尔定律。建议的研究计划启用稀疏DNN训练DNN主要是昂贵的，因为经验上建立的，但了解甚少，在训练过程中需要过度参数化DNN模型，以实现良好的泛化（在看不见的数据上的性能）。我们知道这些模型是过度参数化的，因为在训练之后，DNN模型的80-95%的学习权重（参数）可以被删除（修剪），而不会显著损失泛化能力[16]。非结构化修剪，即从DNN中删除不必要的个体权重，可以非常有效地减少应用程序中使用的预训练DNN的大小和效率。然而，尝试从随机初始化训练非结构化稀疏DNN-就像训练密集（标准）DNN一样-很少能实现与密集训练相似的泛化经验。因此，非结构化稀疏性在降低DNN训练成本方面尚未发挥重要作用。开发有效的训练方法将使DNN训练更便宜，更快，更可重复，更重要的是研究人员和新应用程序更容易获得。高效的深度学习和社会：高效DNN的对抗鲁棒性和偏差已经在工业应用中使用的DNN与大多数学术论文中提出的DNN在使用高效DNN方法方面有很大不同（例如量化，修剪和蒸馏）和有效的DNN架构-然而，很少有工作探索研究良好的学术模型之间的差异，以及行业应用的高效DNN，而不仅仅是泛化性能和效率。了解高效DNN和学术研究中常见的典型DNN之间的差异变得越来越重要，因为我们的社会越来越依赖这项技术。随着影响我们社会的潜力越来越大，识别和解决任何针对高效DNN的鲁棒性和偏见问题是一个越来越重要的研究领域，也是一个与在现实世界中使用DNN的各种潜在工业合作伙伴相关的领域。