Collaborative Research: OAC Core: CEAPA: A Systematic Approach to Minimize Compression Error Propagation in HPC Applications

合作研究:OAC 核心:CEAPA:一种最小化 HPC 应用中压缩错误传播的系统方法

基本信息

  • 批准号:
    2211539
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-15 至 2022-10-31
  • 项目状态:
    已结题

项目摘要

Today’s high-performance computing (HPC) applications produce vast volumes of data for post-analysis, presenting a major storage and I/O burden for HPC systems. To significantly reduce this burden, researchers have explored to use lossy compression techniques. While lossy compression can effectively reduce the size of data, it also introduces errors to the compressed data that often lead to incorrect computation results. As a result, scientists hesitate to use lossy compression in their scientific research. Thus, there is a critical need to develop an effective method to identify compression strategies which minimize error impact for a diversity of programs. This project aims to develop a systematic approach that helps scientists automatically select a lossy compression algorithm with the lowest error impact based their HPC programs and target compression ratios. It also integrates educational and outreach activities including student training and development of new curriculum on trustworthy data reduction and dependable HPC systems. Modeling compression error propagation in HPC programs is challenging because existing lossy compressors are developed with distinct principles that generate largely different compression errors on diverse HPC data. This project includes four key thrusts: (1) developing an accurate and efficient fault injection infrastructure that integrates with the fault models of commonly used lossy compression algorithms; (2) designing a fine-grained approach to characterize error propagation in HPC programs through program analysis and deposition based on the data dependencies and life cycle of compressed data; (3) developing a predictive model using machine learning techniques to select a compression strategy that minimizes the error impact on a given program and compression ratio; and (4) integrating the technique with domain-specific error impact metrics in real-world HPC applications and demonstrates the effectiveness of the technique by selecting compression strategies that give low error impact for the same ratios. Not only this project has an enormous positive impact on HPC cyberinfrastructure, but it also helps redefine the optimization of lossy compression techniques with emphasis on both efficiency and error impact.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当今的高性能计算(HPC)应用程序会产生大量的数据用于后期分析,这给HPC系统带来了巨大的存储和I/O负担。为了显著减少这种负担,研究人员已经探索使用有损压缩技术。虽然有损压缩可以有效地减少数据的大小,但它也会向压缩数据引入错误,这些错误通常会导致不正确的计算结果。因此,科学家们对在科学研究中使用有损压缩犹豫不决。因此,迫切需要开发一种有效的方法来识别压缩策略,使错误对各种程序的影响最小化。该项目旨在开发一种系统化的方法,帮助科学家根据HPC程序和目标压缩比自动选择具有最低错误影响的有损压缩算法。它还整合了教育和推广活动,包括学生培训和开发关于可靠数据简化和可靠HPC系统的新课程。在HPC程序中建模压缩误差传播是具有挑战性的,因为现有的有损压缩器是以不同的原理开发的,这些原理在不同的HPC数据上产生很大程度上不同的压缩误差。该项目主要包括四个方面的工作:(1)开发一个准确、高效的故障注入基础设施,该基础设施与常用有损压缩算法的故障模型相集成;(2)设计一种细粒度的方法,通过基于压缩数据的数据依赖性和生命周期的程序分析和沉积来表征HPC程序中的错误传播;(3)使用机器学习技术来开发预测模型,以选择最小化对给定程序和压缩比的错误影响的压缩策略;以及(4)在真实的中将该技术与特定于域的错误影响度量相结合。世界HPC应用程序,并证明了该技术的有效性,通过选择压缩策略,使低错误的影响相同的比率。该项目不仅对HPC网络基础设施产生了巨大的积极影响,而且还有助于重新定义有损压缩技术的优化,重点关注效率和错误影响。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Design of a Quantization-Based DNN Delta Compression Framework for Model Snapshots and Federated Learning
  • DOI:
    10.1109/tpds.2022.3230840
  • 发表时间:
    2023-03
  • 期刊:
  • 影响因子:
    5.3
  • 作者:
    Haoyu Jin;Donglei Wu;Shuyu Zhang;Xiangyu Zou;Sian Jin;Dingwen Tao;Qing Liao;Wen Xia
  • 通讯作者:
    Haoyu Jin;Donglei Wu;Shuyu Zhang;Xiangyu Zou;Sian Jin;Dingwen Tao;Qing Liao;Wen Xia
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dingwen Tao其他文献

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
FastCLIP:一套优化技术,可利用有限的资源加速 CLIP 培训
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiyuan Wei;Fanjiang Ye;Ori Yonay;Xingyu Chen;Baixi Sun;Dingwen Tao;Tianbao Yang
  • 通讯作者:
    Tianbao Yang
Z-checker: A framework for assessing lossy compression of scientific data
Z-checker:评估科学数据有损压缩的框架
Extending checksum-based ABFT to tolerate soft errors online in iterative methods
扩展基于校验和的 ABFT 以容忍迭代方法中的在线软错误
Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data
科学数据的相对误差有限有损压缩的性能优化
  • DOI:
    10.1109/tpds.2020.2972548
  • 发表时间:
    2020-07
  • 期刊:
  • 影响因子:
    5.3
  • 作者:
    Xiangyu Zou;Tao Lu;Wen Xia;Xuan Wang;Weizhe Zhang;Haijun Zhang;Sheng Di;Dingwen Tao;Franck Cappello
  • 通讯作者:
    Franck Cappello
A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization
用于多分辨率科学数据简化和可视化的高质量工作流程
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Daoce Wang;Pascal Grosset;Jesus Pulido;Tushar M. Athawale;Jiannan Tian;Kai Zhao;Z. Lukic;Axel Huebl;Zhe Wang;James P. Ahrens;Dingwen Tao
  • 通讯作者:
    Dingwen Tao

Dingwen Tao的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dingwen Tao', 18)}}的其他基金

CAREER: A Highly Effective, Usable, Performant, Scalable Data Reduction Framework for HPC Systems and Applications
职业:适用于 HPC 系统和应用程序的高效、可用、高性能、可扩展的数据缩减框架
  • 批准号:
    2232120
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: FZ: A fine-tunable cyberinfrastructure framework to streamline specialized lossy compression development
合作研究:框架:FZ:一个可微调的网络基础设施框架,用于简化专门的有损压缩开发
  • 批准号:
    2311876
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: Reimagining Communication Bottlenecks in GNN Acceleration through Collaborative Locality Enhancement and Compression Co-Design
协作研究:SHF:小型:通过协作局部性增强和压缩协同设计重新想象 GNN 加速中的通信瓶颈
  • 批准号:
    2326495
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: A Highly Effective, Usable, Performant, Scalable Data Reduction Framework for HPC Systems and Applications
职业:适用于 HPC 系统和应用程序的高效、可用、高性能、可扩展的数据缩减框架
  • 批准号:
    2312673
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CDS&E: Collaborative Research: HyLoC: Objective-driven Adaptive Hybrid Lossy Compression Framework for Extreme-Scale Scientific Applications
CDS
  • 批准号:
    2303064
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CRII: OAC: An Efficient Lossy Compression Framework for Reducing Memory Footprint for Extreme-Scale Deep Learning on GPU-Based HPC Systems
CRII:OAC:一种有效的有损压缩框架,可减少基于 GPU 的 HPC 系统上超大规模深度学习的内存占用
  • 批准号:
    2303820
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CEAPA: A Systematic Approach to Minimize Compression Error Propagation in HPC Applications
合作研究:OAC 核心:CEAPA:一种最小化 HPC 应用中压缩错误传播的系统方法
  • 批准号:
    2247060
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: ROCCI: Integrated Cyberinfrastructure for In Situ Lossy Compression Optimization Based on Post Hoc Analysis Requirements
合作研究:要素:ROCCI:基于事后分析要求的原位有损压缩优化的集成网络基础设施
  • 批准号:
    2247080
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: ROCCI: Integrated Cyberinfrastructure for In Situ Lossy Compression Optimization Based on Post Hoc Analysis Requirements
合作研究:要素:ROCCI:基于事后分析要求的原位有损压缩优化的集成网络基础设施
  • 批准号:
    2104024
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CDS&E: Collaborative Research: HyLoC: Objective-driven Adaptive Hybrid Lossy Compression Framework for Extreme-Scale Scientific Applications
CDS
  • 批准号:
    2042084
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403312
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402947
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403399
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403089
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了