权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SPX: Collaborative Research: FASTLEAP: FPGA based compact Deep Learning Platform

SPX：协作研究：FASTLEAP：基于 FPGA 的紧凑型深度学习平台

基本信息

批准号：
1919289
负责人：
Xuehai Qian
金额：
$ 84.87万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2023-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1919289&HistoricalAwards=false
关键词：
SPX Collaborative Research FASTLEAP FPGA

项目摘要

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used because of their high accuracy, excellent scalability, and self-adaptiveness properties. Many applications employ DNNs as the core technology, such as face detection, speech recognition, scene parsing. To meet the high accuracy requirement of various applications, DNN models are becoming deeper and larger, and are evolving at a fast pace. They are computation and memory intensive and pose intensive challenges to the conventional Von Neumann architecture used in computing. The key problem addressed by the project is how to accelerate deep learning, not only inference, but also training and model compression, which have not received enough attention in the prior research. This endeavor has the potential to enable the design of fast and energy-efficient deep learning systems, applications of which are found in our daily lives -- ranging from autonomous driving, through mobile devices, to IoT systems, thus benefiting the society at large.The outcome of this project is FASTLEAP - an Field Programmable Gate Array (FPGA)-based platform for accelerating deep learning. The platform takes in a dataset as an input and outputs a model which is trained, pruned, and mapped on FPGA, optimized for fast inferencing. The project will utilize the emerging FPGA technologies that have access to High Bandwidth Memory (HBM) and consist of floating-point DSP units. In a vertical perspective, FASTLEAP integrates innovations from multiple levels of the whole system stack algorithm, architecture and down to efficient FPGA hardware implementation. In a horizontal perspective, it embraces systematic DNN model compression and associated FPGA-based training, as well as FPGA-based inference acceleration of compressed DNN models. The platform will be delivered as a complete solution, with both the software tool chain and hardware implementation to ensure the ease of use. At algorithm level of FASTLEAP, the proposed Alternating Direction Method of Multipliers for Neural Networks (ADMM-NN) framework, will perform unified weight pruning and quantization, given training data, target accuracy, and target FPGA platform characteristics (performance models, inter-accelerator communication). The training procedure in ADMM-NN is performed on a platform with multiple FPGA accelerators, dictated by the architecture-level optimizations on communication and parallelism. Finally, the optimized FPGA inference design is generated based on the trained DNN model with compression, accounting for FPGA performance modeling. The project will address the following SPX research areas: 1) Algorithms: Bridging the gap between deep learning developments in theory and their system implementations cognizant of performance model of the platform. 2) Applications: Scaling of deep learning for domains such as image processing. 3) Architecture and Systems: Automatic generation of deep learning designs on FPGA optimizing area, energy-efficiency, latency, and throughput.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着近年来人工智能的兴起，深度神经网络（DNN）因其高精度、良好的可扩展性和自适应性而得到广泛应用。许多应用都采用DNN作为核心技术，如人脸检测、语音识别、场景解析等。为了满足各种应用的高精度要求，DNN模型正在变得更深更大，并且正在快速发展。它们是计算和存储密集型的，并对计算中使用的传统冯·诺依曼架构提出了强烈的挑战。该项目解决的关键问题是如何加速深度学习，不仅包括推理，还包括训练和模型压缩，这些在之前的研究中没有得到足够的重视。这一奋进有可能实现快速和节能的深度学习系统的设计，其应用在我们的日常生活中-从自动驾驶，通过移动的设备，到物联网系统，从而造福整个社会。该项目的成果是FASTLEAP -一个基于现场可编程门阵列（FPGA）的平台，用于加速深度学习。该平台将数据集作为输入，并输出一个模型，该模型在FPGA上经过训练、修剪和映射，并针对快速推理进行了优化。该项目将利用新兴的FPGA技术，可以访问高带宽存储器（HBM），并由浮点DSP单元组成。从纵向角度来看，FASTLEAP集成了从整个系统堆栈算法、架构到高效FPGA硬件实现的多个层面的创新。从横向角度来看，它包括系统的DNN模型压缩和相关的基于FPGA的训练，以及压缩DNN模型的基于FPGA的推理加速。该平台将作为一个完整的解决方案交付，包括软件工具链和硬件实施，以确保易用性。在FASTLEAP的算法级，提出的神经网络乘子交替方向法（ADMM-NN）框架将执行统一的权重修剪和量化，给定训练数据，目标精度和目标FPGA平台特性（性能模型，加速器间通信）。ADMM-NN中的训练过程在具有多个FPGA加速器的平台上执行，由通信和并行性的架构级优化决定。最后，基于经过压缩的DNN模型生成优化的FPGA推理设计，考虑FPGA性能建模。该项目将涉及以下SPX研究领域：1）算法：弥合理论上的深度学习发展与其系统实现之间的差距，认识到平台的性能模型。2)应用：用于图像处理等领域的深度学习的扩展。3)架构和系统：在FPGA上自动生成深度学习设计，优化面积、能效、延迟和吞吐量。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

FASTHash: FPGA-Based High Throughput Parallel Hash Table

DOI：
10.1007/978-3-030-50743-5_1
发表时间：
2020-05-22
期刊：
High Performance Computing
影响因子：
0
作者：
Yang Y;Kuppannagari SR;Srivastava A;Kannan R;Prasanna VK
通讯作者：
Prasanna VK

Hardware Acceleration of Large Scale GCN Inference

DOI：
10.1109/asap49362.2020.00019
发表时间：
2020-07
期刊：
2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)
影响因子：
0
作者：
Bingyi Zhang;Hanqing Zeng;V. Prasanna
通讯作者：
Bingyi Zhang;Hanqing Zeng;V. Prasanna

A High Throughput Parallel Hash Table on FPGA using XOR-based Memory

DOI：
10.1109/hpec43674.2020.9286199
发表时间：
2020-09
期刊：
2020 IEEE High Performance Extreme Computing Conference (HPEC)
影响因子：
0
作者：
Ruizhi Zhang;Sasindu Wijeratne;Yang Yang-Yang;S. Kuppannagari;V. Prasanna
通讯作者：
Ruizhi Zhang;Sasindu Wijeratne;Yang Yang-Yang;S. Kuppannagari;V. Prasanna

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Xuehai Qian其他文献

Response characterization on the microstructure, and mechanical and corrosion behavior of clad rebars of different weld materials

不同焊接材料包覆钢筋的微观结构、力学性能和腐蚀行为的响应特性

DOI：
10.1016/j.cscm.2025.e04316
发表时间：
2025-07-01
期刊：
Case Studies in Construction Materials
影响因子：
6.600
作者：
Zecheng Zhuang;Xuehai Qian;Lei Zeng;Weiping Lu;Zhen Li;Yong Xiang
通讯作者：
Yong Xiang

Effects of varying weld speeds on the microstructure, mechanical properties, and corrosion behavior of clad rebars in a marine environment

不同焊接速度对海洋环境中复合钢筋的微观结构、力学性能和腐蚀行为的影响

DOI：
10.1038/s41598-025-08448-7
发表时间：
2025-07-02
期刊：
Scientific Reports
影响因子：
3.900
作者：
Zecheng Zhuang;Weiping Lu;Zhe Gou;Lei Zeng;Xuehai Qian;Rifeng Wang;Erte Lin;Zhen Li;Yong Xiang;Jianping Tan
通讯作者：
Jianping Tan

Graph Transformer for Quantum Circuit Reliability Prediction

用于量子电路可靠性预测的图形变压器

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Hanrui Wang;Pengyu Liu;Jinglei Cheng;Zhiding Liang;Jiaqi Gu;Zi;Yongshan Ding;Weiwen Jiang;Yiyu Shi;Xuehai Qian;D. Pan;F. Chong;Song Han
通讯作者：
Song Han

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

RobustState：通过噪声感知变分训练提高量子态准备的保真度

DOI：
发表时间：
2023
期刊：
arXiv.org
影响因子：
0
作者：
Hanrui Wang;Yilian Liu;Pengyu Liu;Jiaqi Gu;Zi;Zhiding Liang;Jinglei Cheng;Yongshan Ding;Xuehai Qian;Yiyu Shi;David Z. Pan;Frederic T. Chong;Song Han
通讯作者：
Song Han