权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CRII: OAC: An Efficient Lossy Compression Framework for Reducing Memory Footprint for Extreme-Scale Deep Learning on GPU-Based HPC Systems

CRII：OAC：一种有效的有损压缩框架，可减少基于 GPU 的 HPC 系统上超大规模深度学习的内存占用

基本信息

批准号：
2303820
负责人：
Dingwen Tao
金额：
$ 17.46万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2024-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2303820&HistoricalAwards=false
关键词：
CRII OAC Efficient Lossy Compression

项目摘要

Deep learning (DL) has rapidly evolved to a state-of-the-art technique in many science and technology disciplines, such as scientific exploration, national security, smart environment, and healthcare. Many of these DL applications require using high-performance computing (HPC) resources to process large amounts of data. Researchers and scientists, for instance, are employing extreme-scale DL applications in HPC infrastructures to classify extreme weather patterns and high-energy particles. In recent years, using Graphics Processing Units (GPUs) to accelerate DL applications has attracted increasing attention. However, the ever-increasing scales of DL applications bring many challenges to today’s GPU-based HPC infrastructures. The key challenge is the huge gap (e.g., one to two orders of magnitude) between the memory requirement and its availability on GPUs. This project aims to fill this gap by developing a novel framework to reduce the memory demand effectively and efficiently via data compression technologies for extreme-scale DL applications. The proposed research will enhance the GPU-based HPC infrastructures in broad communities for many scientific disciplines that rely on DL technologies. The project will connect machine learning and HPC communities and increase interactions between them. Educational and engagement activities include developing new curriculum related to data compression, mentoring a selected group of high school students in a year-long research project for a regional Science Fair competition, and increasing the community's understanding of leveraging HPC infrastructures for DL technologies. The project will also encourage student interest in research related to DL technologies on HPC environment and promote research collaborations with multiple national laboratories.Existing state-of-the-art GPU memory saving methods for training extreme-scale deep neural networks (DNNs) suffer from high performance overhead and/or low memory footprint reduction. Error-bounded lossy compression is a promising approach to significantly reduce the memory footprint while still meeting the required analysis accuracy. This project will explore how to leverage error-bounded lossy compression on DNN intermediate data to reduce the memory footprint for extreme-scale DNN training. The project has a three-stage research plan. First, the team will comprehensively investigate the impacts of applying error-bounded lossy compression to DNN intermediate data on both validation accuracy and training performance, using different error-bounded lossy compressors, compression modes, and error bounds on the targeted DNNs and datasets. Second, the team will optimize the compression quality of suitable error-bounded lossy compressors on different intermediate data based on the impact analysis outcome, and design an efficient scheme to adaptively apply a best-fit compression solution. Finally, the team will optimize the compression performance on the proposed lossy compression framework for state-of-the-art GPUs. The team will evaluate the proposed framework on high-resolution climate analytics and high-energy particle physics applications and compare it with existing state-of-the-art techniques based on both the memory footprint reduction ratio and training performance improvements (e.g., throughput, time, epoch number). The project will enable scientists and researchers to train extreme-scale DNNs with a given set of computing resources in a fast and efficient manner, opening opportunities for new discoveries.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

深度学习（DL）已经迅速发展成为许多科学和技术学科的最先进技术，例如科学探索，国家安全，智能环境和医疗保健。这些DL应用程序中的许多都需要使用高性能计算（HPC）资源来处理大量数据。例如，研究人员和科学家正在HPC基础设施中使用极端规模的DL应用程序来对极端天气模式和高能粒子进行分类。近年来，使用图形处理单元（GPU）来加速DL应用引起了越来越多的关注。然而，不断增长的DL应用规模给当今基于GPU的HPC基础架构带来了许多挑战。关键的挑战是巨大的差距（例如，一到两个数量级）。该项目旨在通过开发一种新的框架来填补这一空白，以有效地减少内存需求，通过数据压缩技术的极端规模的DL应用。拟议的研究将增强许多依赖DL技术的科学学科的广泛社区中基于GPU的HPC基础设施。该项目将连接机器学习和HPC社区，并增加它们之间的互动。教育和参与活动包括开发与数据压缩相关的新课程，在为期一年的区域科学博览会竞赛研究项目中指导选定的一组高中生，以及提高社区对利用HPC基础设施进行DL技术的理解。该项目还将鼓励学生对HPC环境下DL技术相关研究的兴趣，并促进与多个国家实验室的研究合作。现有用于训练极端规模深度神经网络（DNN）的最先进GPU内存节省方法存在高性能开销和/或低内存占用减少的问题。误差受限有损压缩是一种很有前途的方法，可以显着减少内存占用，同时仍然满足所需的分析精度。该项目将探索如何利用DNN中间数据的错误限制有损压缩来减少极端规模DNN训练的内存占用。该项目有三个阶段的研究计划。首先，该团队将全面研究将误差有界有损压缩应用于DNN中间数据对验证准确性和训练性能的影响，使用不同的误差有界有损压缩器，压缩模式和目标DNN和数据集的误差界限。其次，该团队将根据影响分析结果优化合适的错误有界有损压缩器对不同中间数据的压缩质量，并设计一个有效的方案来自适应地应用最佳压缩解决方案。最后，该团队将在最先进的GPU上优化拟议的有损压缩框架的压缩性能。该团队将评估高分辨率气候分析和高能粒子物理应用的拟议框架，并将其与基于内存占用减少率和训练性能改进的现有最先进技术进行比较（例如，吞吐量、时间、历元数）。该项目将使科学家和研究人员能够以快速有效的方式利用给定的计算资源训练极端规模的DNN，为新发现创造机会。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（19）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs

DOI：
10.1016/j.jpdc.2021.02.013
发表时间：
2020-02
期刊：
J. Parallel Distributed Comput.
影响因子：
0
作者：
Cody Rivera;Jieyang Chen;Nan Xiong;Jing Zhang;S. Song;Dingwen Tao
通讯作者：
Cody Rivera;Jieyang Chen;Nan Xiong;Jing Zhang;S. Song;Dingwen Tao

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

DOI：
10.1109/dac18072.2020.9218499
发表时间：
2020-02
期刊：
2020 57th ACM/IEEE Design Automation Conference (DAC)
影响因子：
0
作者：
Peiyan Dong;Siyue Wang;Wei Niu;Chengming Zhang;Sheng Lin;Z. Li;Yifan Gong;Bin Ren;X. Lin;Yanzhi Wang;Dingwen Tao
通讯作者：
Peiyan Dong;Siyue Wang;Wei Niu;Chengming Zhang;Sheng Lin;Z. Li;Yifan Gong;Bin Ren;X. Lin;Yanzhi Wang;Dingwen Tao

HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

DOI：
10.1145/3559009.3569647
发表时间：
2022-08
期刊：
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
影响因子：
0
作者：
Xinyu Chen;Marco Minutoli;Jiannan Tian;M. Halappanavar;A. Kalyanaraman;Dingwen Tao
通讯作者：
Xinyu Chen;Marco Minutoli;Jiannan Tian;M. Halappanavar;A. Kalyanaraman;Dingwen Tao

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning

DOI：
10.1145/3447818.3459988
发表时间：
2020-11
期刊：
Proceedings of the 35th ACM International Conference on Supercomputing
影响因子：
0
作者：
Chengming Zhang;Geng Yuan;Wei Niu;Jiannan Tian;Sian Jin;Donglin Zhuang;Zhe Jiang;Yanzhi Wang;Bin Ren;S. Song;Dingwen Tao
通讯作者：
Chengming Zhang;Geng Yuan;Wei Niu;Jiannan Tian;Sian Jin;Donglin Zhuang;Zhe Jiang;Yanzhi Wang;Bin Ren;S. Song;Dingwen Tao

Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures

DOI：
10.1109/ipdps49936.2021.00097
发表时间：
2020-10
期刊：
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
影响因子：
0
作者：
Jiannan Tian;Cody Rivera;S. Di;Jieyang Chen;Xin Liang;Dingwen Tao;F. Cappello
通讯作者：
Jiannan Tian;Cody Rivera;S. Di;Jieyang Chen;Xin Liang;Dingwen Tao;F. Cappello

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Dingwen Tao其他文献

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

FastCLIP：一套优化技术，可利用有限的资源加速 CLIP 培训

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Xiyuan Wei;Fanjiang Ye;Ori Yonay;Xingyu Chen;Baixi Sun;Dingwen Tao;Tianbao Yang
通讯作者：
Tianbao Yang

Z-checker: A framework for assessing lossy compression of scientific data

Z-checker：评估科学数据有损压缩的框架

DOI：
发表时间：
2017
期刊：
The international journal of high performance computing applications
影响因子：
0
作者：
Dingwen Tao;S. Di;Hanqi Guo;Zizhong Chen;F. Cappello
通讯作者：
F. Cappello

Extending checksum-based ABFT to tolerate soft errors online in iterative methods

扩展基于校验和的 ABFT 以容忍迭代方法中的在线软错误

DOI：
发表时间：
2014
期刊：
International Conference on Parallel and Distributed Systems
影响因子：
0
作者：
Longxiang Chen;Dingwen Tao;Panruo Wu;Zizhong Chen
通讯作者：
Zizhong Chen

Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data

科学数据的相对误差有限有损压缩的性能优化

DOI：
10.1109/tpds.2020.2972548
发表时间：
2020-07
期刊：
IEEE Transactions on Parallel and Distributed Systems
影响因子：
5.3
作者：
Xiangyu Zou;Tao Lu;Wen Xia;Xuan Wang;Weizhe Zhang;Haijun Zhang;Sheng Di;Dingwen Tao;Franck Cappello
通讯作者：
Franck Cappello

A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization

用于多分辨率科学数据简化和可视化的高质量工作流程

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Daoce Wang;Pascal Grosset;Jesus Pulido;Tushar M. Athawale;Jiannan Tian;Kai Zhao;Z. Lukic;Axel Huebl;Zhe Wang;James P. Ahrens;Dingwen Tao
通讯作者：
Dingwen Tao