SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform

SPX：协作研究：FASTLEAP：基于 FPGA 的紧凑型深度学习平台

基本信息

批准号：
1919117
负责人：
Yanzhi Wang
金额：
$ 35万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1919117&HistoricalAwards=false
关键词：
SPX Collaborative Research FASTLEAP FPGA

项目摘要

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used because of their high accuracy, excellent scalability, and self-adaptiveness properties. Many applications employ DNNs as the core technology, such as face detection, speech recognition, scene parsing. To meet the high accuracy requirement of various applications, DNN models are becoming deeper and larger, and are evolving at a fast pace. They are computation and memory intensive and pose intensive challenges to the conventional Von Neumann architecture used in computing. The key problem addressed by the project is how to accelerate deep learning, not only inference, but also training and model compression, which have not received enough attention in the prior research. This endeavor has the potential to enable the design of fast and energy-efficient deep learning systems, applications of which are found in our daily lives -- ranging from autonomous driving, through mobile devices, to IoT systems, thus benefiting the society at large.The outcome of this project is FASTLEAP - an Field Programmable Gate Array (FPGA)-based platform for accelerating deep learning. The platform takes in a dataset as an input and outputs a model which is trained, pruned, and mapped on FPGA, optimized for fast inferencing. The project will utilize the emerging FPGA technologies that have access to High Bandwidth Memory (HBM) and consist of floating-point DSP units. In a vertical perspective, FASTLEAP integrates innovations from multiple levels of the whole system stack algorithm, architecture and down to efficient FPGA hardware implementation. In a horizontal perspective, it embraces systematic DNN model compression and associated FPGA-based training, as well as FPGA-based inference acceleration of compressed DNN models. The platform will be delivered as a complete solution, with both the software tool chain and hardware implementation to ensure the ease of use. At algorithm level of FASTLEAP, the proposed Alternating Direction Method of Multipliers for Neural Networks (ADMM-NN) framework, will perform unified weight pruning and quantization, given training data, target accuracy, and target FPGA platform characteristics (performance models, inter-accelerator communication). The training procedure in ADMM-NN is performed on a platform with multiple FPGA accelerators, dictated by the architecture-level optimizations on communication and parallelism. Finally, the optimized FPGA inference design is generated based on the trained DNN model with compression, accounting for FPGA performance modeling. The project will address the following SPX research areas: 1) Algorithms: Bridging the gap between deep learning developments in theory and their system implementations cognizant of performance model of the platform. 2) Applications: Scaling of deep learning for domains such as image processing. 3) Architecture and Systems: Automatic generation of deep learning designs on FPGA optimizing area, energy-efficiency, latency, and throughput.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着近年来人工智能的兴起，由于其高精度，出色的可扩展性和自适应性能，深度神经网络（DNN）被广泛使用。许多应用程序采用DNN作为核心技术，例如面部检测，语音识别，场景解析。为了满足各种应用的高精度要求，DNN模型变得越来越深，并且正在快速发展。它们是计算和记忆密集型的，并对计算中使用的常规von Neumann架构构成了密集的挑战。该项目解决的关键问题是如何加速深度学习，不仅推论，而且还要加速培训和模型压缩，这些培训和模型压缩在先前的研究中没有得到足够的关注。这项努力有可能使快速，节能的深度学习系统的设计在我们的日常生活中发现的应用 - 从自动驾驶，通过移动设备到物联网系统，从而使社会受益匪浅。该项目的结果是FastLeap- FastLeap-FastLeap-Field-Field-abledable Gate Array阵列（FPGA）基于基于深度学习的平台，以加速学习。该平台作为输入接收数据集，并输出一个在FPGA上训练，修剪和映射的模型，以快速推断。该项目将利用可以访问高带宽内存（HBM）的新兴FPGA技术，由浮点DSP单元组成。从垂直的角度来看，Fastleap从整个系统堆栈算法，体系结构和向下到有效的FPGA硬件实现的创新集成了创新。从水平角度来看，它包含系统的DNN模型压缩和相关的基于FPGA的训练，以及基于FPGA的压缩DNN模型的推理加速度。该平台将通过软件工具链和硬件实现提供作为完整的解决方案，以确保易用性。在FastLeap算法级别上，提议的神经网络乘数的交替方向方法（ADMM-NN）框架将执行统一的重量修剪和量化，给定培训数据，目标准确性和目标FPGA平台特征（性能模型，Inter-Accelerer inter-Accelerator通信）。 ADMM-NN中的培训程序是在具有多个FPGA加速器的平台上执行的，该平台由建筑级别的通信和并行性优化决定。最后，基于训练有素的DNN模型生成了优化的FPGA推理设计，并考虑了FPGA性能建模。该项目将解决以下SPX研究领域：1）算法：弥合理论中深度学习发展与其系统实现平台性能模型之间的差距。 2）应用：图像处理等域的深度学习缩放。 3）建筑和系统：自动生成有关FPGA优化领域，能源效率，潜伏期和吞吐量的深度学习设计。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子和更广泛影响的评估评估来获得支持的。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding

DOI：
10.1007/978-3-031-19775-8_3
发表时间：
2022
期刊：
影响因子：
0
作者：
Geng Yuan;Sung-En Chang;Qing Jin;Alec Lu;Yanyu Li;Yushu Wu;Zhenglun Kong;Yanyue Xie;Peiyan Dong;Minghai Qin;Xiaolong Ma;Xulong Tang;Zhenman Fang;Yanzhi Wang
通讯作者：
Geng Yuan;Sung-En Chang;Qing Jin;Alec Lu;Yanyu Li;Yushu Wu;Zhenglun Kong;Yanyue Xie;Peiyan Dong;Minghai Qin;Xiaolong Ma;Xulong Tang;Zhenman Fang;Yanzhi Wang

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks

DOI：
10.1145/3392717.3392749
发表时间：
2020-05
期刊：
Proceedings of the 34th ACM International Conference on Supercomputing
影响因子：
0
作者：
Runbin Shi;Peiyan Dong;Tong Geng;Yuhao Ding;Xiaolong Ma;Hayden Kwok-Hay So;M. Herbordt;Ang Li;Yanzhi Wang
通讯作者：
Runbin Shi;Peiyan Dong;Tong Geng;Yuhao Ding;Xiaolong Ma;Hayden Kwok-Hay So;M. Herbordt;Ang Li;Yanzhi Wang

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

DOI：
10.1145/3495532
发表时间：
2021-11
期刊：
ACM Transactions on Design Automation of Electronic Systems (TODAES)
影响因子：
0
作者：
Yifan Gong;Geng Yuan;Zheng Zhan;Wei Niu;Zhengang Li;Pu Zhao;Yuxuan Cai;Sijia Liu;Bin Ren;Xue Lin;Xulong Tang;Yanzhi Wang
通讯作者：
Yifan Gong;Geng Yuan;Zheng Zhan;Wei Niu;Zhengang Li;Pu Zhao;Yuxuan Cai;Sijia Liu;Bin Ren;Xue Lin;Xulong Tang;Yanzhi Wang

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

DOI：
10.1145/3453483.3454083
发表时间：
2021-01-01
期刊：
PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21)
影响因子：
0
作者：
Niu, Wei;Guan, Jiexiong;Ren, Bin
通讯作者：
Ren, Bin

Advancing Model Pruning via Bi-level Optimization

DOI：
10.48550/arxiv.2210.04092
发表时间：
2022-10
期刊：
ArXiv
影响因子：
0
作者：
Yihua Zhang;Yuguang Yao;Parikshit Ram;Pu Zhao;Tianlong Chen;Min-Fong Hong;Yanzhi Wang;Sijia Liu-Siji
通讯作者：
Yihua Zhang;Yuguang Yao;Parikshit Ram;Pu Zhao;Tianlong Chen;Min-Fong Hong;Yanzhi Wang;Sijia Liu-Siji

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yanzhi Wang其他文献

Meaningful Use of Inhaled Nitric Oxide (iNO): a Cross-Sectional National Survey

吸入一氧化氮 (iNO) 的有意义使用：全国横断面调查

DOI：
10.1007/s42399-021-00818-2
发表时间：
2021
期刊：
SN Comprehensive Clinical Medicine
影响因子：
0
作者：
Mina Hafzalah;Yanzhi Wang;S. Tripathi
通讯作者：
S. Tripathi

Proteolysis Targeting Chimeras (PROTACs) Based on Imatinib Induced Degradation of BCR‐ABL in K562 Cells

基于伊马替尼诱导 K562 细胞中 BCR-ABL 降解的蛋白水解靶向嵌合体 (PROTAC)

DOI：
发表时间：
2023
期刊：
ChemistrySelect
影响因子：
2.1
作者：
Chuang Li;P. Zhang;Gaojie Chang;Mingyue Pan;Feng Lu;Jiahao Huang;Yanzhi Wang;Qingyan Zhao;Bingxia Sun;Yuting Cui;Feng Sang
通讯作者：
Feng Sang

Progress of Solid‐state Electrolytes Used in Organic Secondary Batteries

有机二次电池固态电解质研究进展

DOI：
10.1002/celc.202101005
发表时间：
2021-10
期刊：
ChemElectroChem
影响因子：
4
作者：
Shaolong Wang;Jing Lv;Xuehan Wang;Haixia Cui;Weiwei Huang;Yanzhi Wang
通讯作者：
Yanzhi Wang

A Yolk-Shell Structured Metal-Organic Framework with Encapsulated Iron-Porphyrin and Its Derived Bimetallic Nitrogen-Doped Porous Carbon for An Efficient Oxygen Reduction Reaction

具有包封铁卟啉的蛋黄壳结构金属有机框架及其衍生的双金属氮掺杂多孔碳，用于有效的氧还原反应

DOI：
10.1039/d0ta00962h
发表时间：
2020
期刊：
Journal of Materials Chemistry A
影响因子：
11.9
作者：
Chaochao Zhang;Hao Yang;Dan Zhong;Yang Xu;Yanzhi Wang;Qi Yuan;Zuozhong Liang;Bin Wang;Wei Zhang;Haoquan Zheng;Tao Cheng;Rui Cao
通讯作者：
Rui Cao