深度学习处理器体系结构-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

课题基金

基金详情

深度学习处理器体系结构

结题报告

批准号：

61732002

项目类别：

重点项目

资助金额：

305.0 万元

负责人：

钱德沛

依托单位：

北京航空航天大学

学科分类：

F0204.计算机系统结构与硬件技术

结题年份：

2022

批准年份：

2017

项目状态：

已结题

项目参与者：

陈天石、宋福兴、董清秀、刘轶、李栋、钱诚、王锐、杨海龙、宋平

关键词：

指令集微体系结构深度学习算法库

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

深度学习被公认为是最重要的智能处理技术之一，是超级计算机、数据中心、智能手机到嵌入式设备等各种不同应用场景上的重要负载。然而，现有各种芯片难以同时满足各种场景对深度学习处理高性能、低能耗、低延迟的三大需求。为研制能满足上述三大需求的新型深度学习处理器，本项目拟充分利用深度学习本身对于计算过程中非精确性的容忍能力，通过适当放松计算的精确度，提升芯片的深度学习处理能力。我们拟抽象出非精确深度学习处理器的指令集（含计算、访存和控制指令），并面向该指令集探索深度学习处理器的微结构、算法库和系统软件。上述研究成果将被集成到一款深度学习处理器的样片中。我们还将基于该样片构建拟面向健康管理的有界误差深度学习应用示范。通过指令集、微结构、算法库、系统软件以及示范性应用五方面全链式的研究，本项目有望从结构层面上实现深度学习处理器速度和能效的数量级提升，为国产深度学习处理器的发展提供借鉴。

英文摘要

Deep learning is widely recognized as one of the most important intelligent processing techniques, and it is crucial tasks on supercomputers, data centers, mobile phones, and embedded devices. However, current chips cannot simultaneously fulfill the three requirements of deep learning applications: high performance, low energy-consumption, and low latency. To develop a novel deep learning architecture which can fulfill the three requirements, this project plans to leverage the inaccuracy tolerance ability of deep learning, so as to promote the efficiency of deep learning chip. We plan to extract an instruction set (including computation instructions, memory access instructions, and control instructions) for inaccurate deep learning processing. And based on the instruction set, we will investigate the microarchitecture, library, and system software of deep learning processor. The above investigations will be adopted by a test chip of deep learning processor. With this test chip, we will also build a series of inaccurate deep learning application demo about healthy management. Our systematically research on deep learning processor may promote the performance and energy-efficiency of deep learning processors by two orders of magnitude, and provide helpful suggestions to the development of domestic deep learning processors.

针对使用常规处理器进行深度学习计算存在的速度慢、功耗高、延迟大等问题，本项目以充分利用深度学习对非精确计算的容忍特性为核心思想，从深度学习处理器设计入手，在指令集、处理器微体系结构、基础算法库、系统软件和应用示范等五个层面开展了研究工作。主要完成的代表性工作如下：1）提出了一种用于神经网络加速器的新型领域专用指令集架构，集成了标量、向量、矩阵、逻辑、数据传输和控制指令。2）提出了支持稀疏神经网络的多种深度学习处理器体系结构Cambricon-X、Cambricon-S、Cambricon-SE和支持非精确训练的Cambricon-Q，能够有效利用神经网络中的权值稀疏性和神经元稀疏性，提升神经网络模型的计算效率速度。与早期的深度学习处理器DianNao相比，最新的Cambricon-SE处理器在性能和能效方面分别提高了10倍和20倍，能够以76.59fps的速度处理1080p的实时视频数据。3）提出了包括自适应剪枝、可分解Winograd等神经网络轻量级基础算法，探讨了不同架构下极低比特卷积的性能优化方法。4）研发了深度学习处理器的汇编器，并提出了一种支持多种深度学习处理器体系结构的编译技术，可以快速匹配实际硬件，实现了一种冗余零探测工具ZeroSpy，可以减少内存和计算资源浪费。 .本项目总计发表50篇高水平论文，包括计算机学会推荐的A类会议论文与期刊论文22篇，B类会议及期刊论文5篇，C类会议及期刊论文11篇，其他会议及期刊论文12篇。申请发明专利15项。本项目培养博士生12人、硕士生28人，项目组成员赵永威的博士学位论文获得2021年度计算机学会优秀博士论文奖。.本项目在支持稀疏神经网络和神经网络量化的深度学习处理器体系结构设计、面向深度学习处理器的指令集、深度学习编译优化等方面的研究受到了广泛关注，相关论文得到较多引用，形成了较大的学术影响力。综上所述，本项目完成了预期研究内容，取得重要成果，达到了预期目标，圆满完成了研究任务。

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

Bench IP: Benchmarking Intelligence Processors

Bench IP：智能处理器基准测试

DOI：10.1007/s11390-018-1805-8

发表时间：2018

期刊：

Journal of Computer Science and Technology

影响因子：0.7

作者：

Tao Jin Hua;Du Zi Dong;Guo Qi;Lan Hui Ying;Zhang Lei;Zhou Sheng Yuan;Xu Ling Jie;Liu Cong;Liu Hai Feng;Tang Shan;Rush Allen;Chen Willian;Liu Shao Li;Chen Yun Ji;Chen Tian Shi

通讯作者：Chen Tian Shi

Efficient detection of silent data corruption in HPC applications with synchronization-free message verification

通过免同步消息验证有效检测 HPC 应用程序中的静默数据损坏

DOI：10.1007/s11227-021-03892-4

发表时间：2021-06

期刊：

The Journal of Supercomputing

影响因子：--

作者：

Guozhen Zhang;Yi Liu;Hailong Yang;Depei Qian

通讯作者：Depei Qian

Mutual calibration training: Training deep neural networks with noisy labels using dual-models