权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SHF: Medium: Training Sparse Neural Networks with Co-Designed Hardware Accelerators: Enabling Model Optimization and Scientific Exploration

SHF：中：使用共同设计的硬件加速器训练稀疏神经网络：实现模型优化和科学探索

基本信息

批准号：
1763747
负责人：
Keith Chugg
金额：
$ 119.98万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-07-01 至 2023-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1763747&HistoricalAwards=false
关键词：
SHF Medium Training Sparse Neural

项目摘要

Machine learning systems are critical drivers of new technologies such as near-perfect automatic speech recognition, autonomous vehicles, computer vision, and natural language understanding. The underlying inference engine for many of these systems is based on neural networks. Before a neural network can be used for these inference tasks, it must be trained using a data corpus of known input-output pairs. This training process is very computationally intensive with current systems requiring weeks to months of time on graphic processing units (GPUs) or central processing units in the cloud. As more data becomes available, this problem of long training time is further exacerbated because larger, more effective network models become desirable. The theoretical understanding of neural networks is limited, so experimentation and empirical optimization remains the primary tool for understanding deep neural networks and innovating in the field. However, the ability to conduct larger scale experiments is becoming concentrated with a few large entities with the necessary financial and computational resources. Even for those with such resources, the painfully long experimental cycle for training neural networks means that large-scale searches and optimizations over the neural network model structure are not performed. The ultimate goal of this research project is to democratize and distribute the ability to conduct large scale neural network training and model optimizations at high speed, using hardware accelerators. Reducing the training time from weeks to hours will allow researchers to run many more experiments, gaining knowledge into the fundamental inner workings of deep learning systems. The hardware accelerators are also much more energy efficient than the existing GPU-based training paradigm, so advances made in this project can significantly reduce the energy consumption required for neural network training tasks.This project comprises an interdisciplinary research plan that spans theory, hardware architecture and design, software control, and system integration. A new class of neural networks that have pre-defined sparsity is being explored. These sparse neural networks are co-designed with a very flexible, high-speed, energy-efficient hardware architecture that maximizes circuit speed for any model size in a given Field Programmable Gate Array (FPGA) chip. This algorithm-hardware co-design is a key research theme that differentiates this approach from previous research that enforces some sparsity during the training process in a manner incompatible with parallel hardware acceleration. In particular, the proposed architecture operates on each network layer simultaneously, executing the forward- and back-propagation in parallel and pipelined fully across layers. With high precision arithmetic, a speed-up of about 5X relative to GPUs is expected. Using log-domain arithmetic, these gains are expected to increase to 100X or larger. Software and algorithms are being developed to manage multiple FPGA boards, simplifying and automating the model search and training process. These algorithms exploit the ability to reconfigure the FPGAs to trade speed for accuracy, a capability lacking in GPUs. These software tools will also serve as a bridge to popular Python libraries used by the machine learning community.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习系统是近乎完美的自动语音识别、自动驾驶汽车、计算机视觉和自然语言理解等新技术的关键驱动力。许多这些系统的底层推理引擎是基于神经网络的。在神经网络可以用于这些推理任务之前，它必须使用已知输入输出对的数据库进行训练。这个训练过程是非常计算密集型的，当前的系统需要在云中的图形处理单元（GPU）或中央处理单元上花费数周至数月的时间。随着更多的数据变得可用，这个长训练时间的问题进一步加剧，因为更大，更有效的网络模型变得可取。对神经网络的理论理解是有限的，因此实验和经验优化仍然是理解深度神经网络和在该领域创新的主要工具。然而，进行更大规模实验的能力正变得集中在具有必要的财政和计算资源的少数大型实体。即使对于那些拥有这些资源的人来说，训练神经网络的漫长实验周期意味着无法对神经网络模型结构进行大规模搜索和优化。该研究项目的最终目标是使用硬件加速器民主化和分发高速进行大规模神经网络训练和模型优化的能力。将训练时间从数周减少到数小时将使研究人员能够进行更多的实验，从而获得深度学习系统基本内部工作原理的知识。硬件加速器也比现有的基于GPU的训练范式更节能，因此该项目的进展可以显着降低神经网络训练任务所需的能耗。该项目包括跨学科的研究计划，涵盖理论，硬件架构和设计，软件控制和系统集成。正在探索一类具有预定义稀疏性的新神经网络。这些稀疏神经网络与一个非常灵活、高速、节能的硬件架构协同设计，可以在给定的现场可编程门阵列（FPGA）芯片中最大限度地提高任何模型尺寸的电路速度。这种算法-硬件协同设计是一个关键的研究主题，它将这种方法与以前的研究区分开来，以前的研究在训练过程中以与并行硬件加速不兼容的方式强制执行一些稀疏性。特别是，所提出的体系结构同时在每个网络层上运行，并行执行正向和反向传播，并完全跨层流水线。在高精度运算的情况下，相对于GPU的速度提高约5倍。使用对数域算法，这些增益预计将增加到100倍或更大。正在开发软件和算法来管理多个FPGA板，简化和自动化模型搜索和训练过程。这些算法利用重新配置FPGA的能力，以速度换取准确性，这是GPU所缺乏的能力。这些软件工具还将作为机器学习社区使用的流行Python库的桥梁。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。