CAREER: Achieving Real-Time Machine Learning with Sparsification-Compilation Co-design
职业:通过稀疏编译协同设计实现实时机器学习
基本信息
- 批准号:2047516
- 负责人:
- 金额:$ 49.37万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-10-01 至 2026-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Machine Learning (ML), particularly Deep Learning (DL), has gained great success in recent years, especially with the use of Deep Neural Networks (DNNs) of different types. Varied DNNs serve as the state-of-the-art foundation and core enabler of many key applications, such as robotics, high-quality video stream processing, augmented reality, wearable devices, smart health devices, etc. Achieving high accuracy typically requires DNNs with large and complex model structures, which also translates into high computing requirements for both training and inference steps. Accelerating the training process on a modern High-Performance Computing (HPC) node or cluster and inference process on a lower-end power-efficient device have both emerged as major challenges. This project focuses on this problem, viewing DNN training and inference as HPC workloads that need to exploit available multi-level parallelism, complex memory hierarchy, and device heterogeneity; while automating the optimizations through a compiler. If this project succeeds, it will, for the first time, enable real-time machine learning for many edge devices, enabling the greater success of ML-based end applications that are important for the society, economy, and other science and engineering areas. This project will also make several contributions towards both education and improving diversity, including: (1) introducing HPC in an ML course, and ML workloads optimization experience in both undergraduate systems and graduate research courses, particularly with interesting demonstration videos; (2) outreaching to undergraduates with the goal of creating interest in (systems) research, and to K-12 with the goal of attracting underrepresented groups to computer science.The key idea of this project to address the above challenge is sparsification-compilation co-design. It first introduces a general sparsification idea called fine-grained structured pruning, which prunes the weights according to certain fine-grained structures and preserves non-zero weights in a more regular way. Based on this idea, this project designs a high-level abstraction called layer-wise intermediate representation (IR) to capture the sparsity information with the goal of enabling aggressive compiler optimizations. Building on a successful application of this idea on two-dimensional DNNs, this project undertakes a comprehensive agenda to fully apply the benefits of this approach. First, it unifies Convolutional Neural Networks and Recurrent Neural Networks acceleration with a more general fine-grained structured pruning instance and a set of enhanced compiler-based automatic optimizations. Second, it improves the pruning or retraining process itself by extending the compiler optimizations from inference to pruning and exploiting domain properties to carry-out optimized application-level checkpointing. Third, it extends the (compiler automated) optimization framework to support high-dimensional and extremely deep DNNs. Finally, it explores data reuse across DNNs for situations where multiple DNNs are co-executed on the same device.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习(ML),特别是深度学习(DL),近年来取得了巨大的成功,特别是使用了不同类型的深度神经网络(DNN)。各种DNN是许多关键应用的最先进的基础和核心使能器,如机器人、高质量视频流处理、增强现实、可穿戴设备、智能健康设备等。要实现高精度,通常需要具有大型复杂模型结构的DNN,这也意味着对训练和推理步骤的计算要求很高。加快现代高性能计算(HPC)节点或群集上的训练过程以及低端节能设备上的推理过程都已成为主要挑战。该项目专注于这个问题,将DNN训练和推理视为HPC工作负载,需要利用可用的多级并行性,复杂的内存层次结构和设备异构性;同时通过编译器自动优化。如果该项目成功,它将首次为许多边缘设备实现实时机器学习,使基于ML的终端应用取得更大成功,这些应用对社会、经济和其他科学和工程领域都很重要。该项目还将为教育和提高多样性做出一些贡献,包括:(1)在ML课程中介绍HPC,以及在本科生系统和研究生研究课程中介绍ML工作负载优化经验,特别是有趣的演示视频;(2)向本科生进行外展,目的是培养他们对(系统)研究的兴趣,和K-12的目标是吸引代表性不足的群体到计算机科学。这个项目的核心思想,以解决上述挑战是稀疏化编译协同设计。它首先介绍了一种称为细粒度结构化修剪的稀疏化思想,根据一定的细粒度结构修剪权重,并以更有规律的方式保留非零权重。基于这一思想,该项目设计了一个高层次的抽象层,称为逐层中间表示(IR),以捕获稀疏信息,目标是实现积极的编译器优化。在二维DNN上成功应用这一想法的基础上,该项目开展了一项全面的议程,以充分应用这种方法的好处。首先,它将卷积神经网络和递归神经网络加速与更一般的细粒度结构化修剪实例和一组增强的基于编译器的自动优化相结合。其次,它通过将编译器优化从推理扩展到修剪和利用域属性来执行优化的应用级检查点来改进修剪或再训练过程本身。第三,它扩展了(编译器自动化)优化框架,以支持高维和极深的DNN。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(17)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Toward Efficient Interactions between Python and Native Libraries
实现 Python 和本机库之间的高效交互
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Tan, J;Chen, C;Liu, Z;Ren, R;Song, R;Shen, X;Liu, X
- 通讯作者:Liu, X
SparCL: Sparse Continual Learning on the Edge
- DOI:10.48550/arxiv.2209.09476
- 发表时间:2022-09
- 期刊:
- 影响因子:0
- 作者:Zifeng Wang;Zheng Zhan;Yifan Gong;Geng Yuan;Wei Niu;T. Jian;Bin Ren;Stratis Ioannidis;Yanzhi Wang;Jennifer G. Dy
- 通讯作者:Zifeng Wang;Zheng Zhan;Yifan Gong;Geng Yuan;Wei Niu;T. Jian;Bin Ren;Stratis Ioannidis;Yanzhi Wang;Jennifer G. Dy
Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hsin-Hsuan Sung;Jou-An Chen;Weiguo Niu;Jiexiong Guan;Bin Ren;Xipeng Shen
- 通讯作者:Hsin-Hsuan Sung;Jou-An Chen;Weiguo Niu;Jiexiong Guan;Bin Ren;Xipeng Shen
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
- DOI:10.1145/3495532
- 发表时间:2021-11
- 期刊:
- 影响因子:0
- 作者:Yifan Gong;Geng Yuan;Zheng Zhan;Wei Niu;Zhengang Li;Pu Zhao;Yuxuan Cai;Sijia Liu;Bin Ren;Xue Lin;Xulong Tang;Yanzhi Wang
- 通讯作者:Yifan Gong;Geng Yuan;Zheng Zhan;Wei Niu;Zhengang Li;Pu Zhao;Yuxuan Cai;Sijia Liu;Bin Ren;Xue Lin;Xulong Tang;Yanzhi Wang
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity
- DOI:10.1109/tpami.2021.3089687
- 发表时间:2021-06
- 期刊:
- 影响因子:23.6
- 作者:Wei Niu;Zhengang;Xiaolong Ma;Peiyan Dong;Gang Zhou;Xuehai Qian;Xue Lin;Yanzhi Wang;Bin Ren
- 通讯作者:Wei Niu;Zhengang;Xiaolong Ma;Peiyan Dong;Gang Zhou;Xuehai Qian;Xue Lin;Yanzhi Wang;Bin Ren
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Bin Ren其他文献
Development of arteriolar niche and self-renewal of breast cancer stem cells by lysophosphatidic Acid/protein kinase D signaling
通过溶血磷脂酸/蛋白激酶 D 信号传导实现小动脉生态位的发育和乳腺癌干细胞的自我更新
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Yinan Jiang;Yichen Guo;Jinjin Hao;R. Guenter;J. Lathia;A. Beck;R. Hattaway;D. Hurst;Q. Wang;Yehe Liu;Qi Cao;H. Krontiras;He;R. Silverstein;Bin Ren - 通讯作者:
Bin Ren
Revealing Protein Binding Affinity on Metal Surfaces:An Electrochemistry Approach
揭示金属表面上的蛋白质结合亲和力:电化学方法
- DOI:
10.1039/d1cc07098c - 发表时间:
2022 - 期刊:
- 影响因子:4.9
- 作者:
Danya Lyu;Pingshi Wang;Shuo zhang;Guokun Liu;Bin Ren - 通讯作者:
Bin Ren
Development of Weak Signal Recognition and an Extraction Algorithm for Raman Imaging
拉曼成像微弱信号识别和提取算法的开发
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:7.4
- 作者:
Xin Wang;Guokun Liu;Mengxi Xu;Bin Ren;Zhongqun Tian - 通讯作者:
Zhongqun Tian
Classication of 2-step nilpotent Lie algebras of dimension 8 with 3-dimensional center
具有 3 维中心的 8 维 2 步幂零李代数的分类
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Bin Ren;Linsheng Zhu - 通讯作者:
Linsheng Zhu
Grouped Temporal Enhancement Module for Human Action Recognition
用于人类动作识别的分组时间增强模块
- DOI:
10.1109/icip40778.2020.9190958 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Hong Liu;Bin Ren;Mengyuan Liu;Runwei Ding - 通讯作者:
Runwei Ding
Bin Ren的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Bin Ren', 18)}}的其他基金
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
- 批准号:
2403088 - 财政年份:2024
- 资助金额:
$ 49.37万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
- 批准号:
2230944 - 财政年份:2023
- 资助金额:
$ 49.37万 - 项目类别:
Standard Grant
Collaborative Research: SHF: SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复(EASER)范式
- 批准号:
2146873 - 财政年份:2022
- 资助金额:
$ 49.37万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: On the Theoretical Foundation of Recommendation System Evaluation
EAGER:协作研究:推荐系统评价的理论基础
- 批准号:
2142681 - 财政年份:2021
- 资助金额:
$ 49.37万 - 项目类别:
Standard Grant
相似海外基金
RTML: Small: Achieving Real-Time and Energy-efficient Computing for 5G Networks (ARTEN): A Deep Reservoir Computing Approach
RTML:小型:实现 5G 网络的实时和节能计算 (ARTEN):一种深水库计算方法
- 批准号:
1937487 - 财政年份:2019
- 资助金额:
$ 49.37万 - 项目类别:
Standard Grant
Achieving real-time nowcasting of corporate performance and stable stock markets using POS data
利用POS数据实现企业业绩实时预报和股市稳定
- 批准号:
17K01277 - 财政年份:2017
- 资助金额:
$ 49.37万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Achieving predictability, repeatability and performance for hard real-time embedded systems
实现硬实时嵌入式系统的可预测性、可重复性和性能
- 批准号:
386714-2010 - 财政年份:2014
- 资助金额:
$ 49.37万 - 项目类别:
Discovery Grants Program - Individual
Achieving predictability, repeatability and performance for hard real-time embedded systems
实现硬实时嵌入式系统的可预测性、可重复性和性能
- 批准号:
386714-2010 - 财政年份:2013
- 资助金额:
$ 49.37万 - 项目类别:
Discovery Grants Program - Individual
Achieving predictability, repeatability and performance for hard real-time embedded systems
实现硬实时嵌入式系统的可预测性、可重复性和性能
- 批准号:
386714-2010 - 财政年份:2012
- 资助金额:
$ 49.37万 - 项目类别:
Discovery Grants Program - Individual
Achieving predictability, repeatability and performance for hard real-time embedded systems
实现硬实时嵌入式系统的可预测性、可重复性和性能
- 批准号:
386714-2010 - 财政年份:2011
- 资助金额:
$ 49.37万 - 项目类别:
Discovery Grants Program - Individual
Achieving predictability, repeatability and performance for hard real-time embedded systems
实现硬实时嵌入式系统的可预测性、可重复性和性能
- 批准号:
386714-2010 - 财政年份:2010
- 资助金额:
$ 49.37万 - 项目类别:
Discovery Grants Program - Individual
Project REACH (Real Elders Achieving Community Health)
REACH 项目(真正的老年人实现社区健康)
- 批准号:
8102923 - 财政年份:2008
- 资助金额:
$ 49.37万 - 项目类别:
Project REACH (Real Elders Achieving Community Health)
REACH 项目(真正的老年人实现社区健康)
- 批准号:
7678877 - 财政年份:2008
- 资助金额:
$ 49.37万 - 项目类别:
Project REACH (Real Elders Achieving Community Health)
REACH 项目(真正的老年人实现社区健康)
- 批准号:
8298466 - 财政年份:2008
- 资助金额:
$ 49.37万 - 项目类别: