SHF: Medium: Collaborative Research: ECC: Ephemeral Coherence Cohort for I/O Containerization and Disaggregation

SHF:媒介:协作研究:ECC:I/O 容器化和分解的临时一致性队列

基本信息

  • 批准号:
    1763540
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-06-01 至 2023-05-31
  • 项目状态:
    已结题

项目摘要

Leadership computing facilities for high-performance computing (HPC) have a huge investment in the file and storage systems. The reason is that the HPC storage system often is the Achilles Heel of HPC systems, as it is fraught with numerous scenarios for contention, congestion and performance variability. This problem is getting worse due to: (a) the increased importance of data-driven HPC and the growth in the amount of data generated by large-scale simulation; and (b) the slower growth of disk speed, as compared to CPU speed. The addition of high-bandwidth persistent memory devices as burst-buffers brings in new opportunities for fast caching of application data while still allowing data persistence. However, the conventional approach of exploiting burst-buffers as yet another caching layer cannot reduce the lengthy and costly data processing steps in the deep I/O stack or reconcile occasional contentions inside the complex storage system. This project, therefore, seeks to exploit burst-buffers as repositories of persistent application-specific parallel file systems, with a lifetime commensurate to the lifetime of an application or an application campaign on a HPC system. This is a collaborative project between University of Illinois at Urbana-Champaign and Florida State University. This project formulates a research framework called Ephemeral Coherence Cohort (ECC) that offers an abstraction to represent the active collection of application data through containerization, insulate I/O activities across different applications, and enable storage disaggregation for ephemeral allocation and dynamic utilization of burst buffers. The proposed ECC framework aims to enhance a variety of mission-critical applications running on the Department of Energy and the National Science Foundation leadership computing facilities. The project strengthens the collaboration between University of Illinois Urbana-Champaign and the Florida State University. The project has plans to organize panels and birds-of-feather sessions on burst buffer research in the upcoming HPC conferences and collaborate with leaders of super-computing centers for wider community penetration with techniques from this research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
用于高性能计算(HPC)的领先计算设施在文件和存储系统方面有巨大的投资。原因是HPC存储系统通常是HPC系统的阿喀琉斯之踵,因为它充满了竞争,拥塞和性能变化的许多场景。由于以下原因,该问题变得越来越严重:(a)数据驱动的HPC的重要性增加以及大规模模拟生成的数据量的增长;以及(B)与CPU速度相比,磁盘速度的增长较慢。作为突发缓冲器的高带宽持久存储器设备的添加为应用程序数据的快速缓存带来了新的机会,同时仍然允许数据持久性。然而,利用突发缓冲器作为另一个高速缓存层的传统方法不能减少深度I/O栈中冗长且昂贵的数据处理步骤,也不能调和复杂存储系统内部的偶尔竞争。因此,该项目旨在利用突发缓冲区作为持久性应用程序特定的并行文件系统的存储库,其生命周期与HPC系统上的应用程序或应用程序活动的生命周期相称。这是伊利诺伊大学香槟分校和佛罗里达州立大学之间的一个合作项目。该项目制定了一个名为Ephemeral Coherence Coherence Cohort(ECC)的研究框架,该框架通过容器化提供了一个抽象来表示应用程序数据的主动收集,隔离不同应用程序之间的I/O活动,并实现存储分解以进行短暂分配和动态利用突发缓冲区。拟议的ECC框架旨在增强在能源部和国家科学基金会领导计算设施上运行的各种关键任务应用程序。该项目加强了伊利诺伊大学香槟分校和佛罗里达州立大学之间的合作。该项目计划在即将到来的HPC会议上组织关于突发缓冲区研究的小组和羽毛会议,并与超级计算中心的领导者合作,以更广泛的社区渗透这项研究的技术。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
File System Semantics Requirements of HPC Applications
HPC 应用程序的文件系统语义要求
Pinpointing crash-consistency bugs in the HPC I/O stack: a cross-layer approach
查明 HPC I/O 堆栈中的崩溃一致性错误:跨层方法
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sun, J.;Huang, J.;Snir M.
  • 通讯作者:
    Snir M.
I/O Traces of HPC Applications
HPC 应用程序的 I/O 跟踪
Verifying IO Synchronization from MPI Traces.
从 MPI 跟踪验证 IO 同步。
Recorder 2.0: Efficient Parallel I/O Tracing and Analysis
Recorder 2.0:高效并行 I/O 跟踪和分析
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Marc Snir其他文献

Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization
使用混合并行化训练大型 3D 宇宙学 CNN
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yosuke Oyama;Naoya Maruyama;Nikoli Dryden;Peter Harrington;Jan Balewski;Satoshi Matsuoka;Marc Snir;Peter Nugent;Brian Van Essen
  • 通讯作者:
    Brian Van Essen
Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications
  • DOI:
    10.1007/s10766-019-00634-1
  • 发表时间:
    2019-03-23
  • 期刊:
  • 影响因子:
    0.900
  • 作者:
    Feng Zhang;Jidong Zhai;Marc Snir;Hai Jin;Hironori Kasahara;Mateo Valero
  • 通讯作者:
    Mateo Valero
Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale
大规模探索基于可再生能源的模块化数据中心的效率
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jinghan Sun;Zibo Gong;Anup Agarwal;Shadi Noghabi;Ranveer Chandra;Marc Snir;Jian Huang
  • 通讯作者:
    Jian Huang
Design and Analysis of the Network Software Stack of an Asynchronous Many-task System -- The LCI parcelport of HPX
异步多任务系统网络软件栈的设计与分析——HPX LCI Parcelport
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jiakun Yan;Hartmut Kaiser;Marc Snir
  • 通讯作者:
    Marc Snir

Marc Snir的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Marc Snir', 18)}}的其他基金

OAC Core: Small: Collaborative Research: Scalable Run-Time for Highly Parallel, Heterogeneous Systems
OAC 核心:小型:协作研究:高度并行、异构系统的可扩展运行时
  • 批准号:
    1908144
  • 财政年份:
    2019
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SHF: Small: Collaborative Research: ALETHEIA: A Framework for Automatic Detection/Correction of Corruptions in Extreme Scale Scientific Executions
SHF:小型:协作研究:ALETHEIA:超大规模科学执行中腐败自动检测/纠正的框架
  • 批准号:
    1617488
  • 财政年份:
    2016
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
XPS: FP: Collaborative Research: Parallel Irregular Programs: From High-Level Specifications to Run-time Optimizations
XPS:FP:协作研究:并行不规则程序:从高级规范到运行时优化
  • 批准号:
    1337217
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
G8 Initiative: Collaborative Research: ECS: Enabling Climate Simulation at Extreme Scale
G8 倡议:合作研究:ECS:实现极端规模的气候模拟
  • 批准号:
    1062790
  • 财政年份:
    2011
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
Deterministic Parallel Programming for High Performance Computing
高性能计算的确定性并行编程
  • 批准号:
    0833128
  • 财政年份:
    2008
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Communication Complexity of Parallel Algorithms
并行算法的通信复杂性
  • 批准号:
    8203307
  • 财政年份:
    1982
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
  • 批准号:
    2403134
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
  • 批准号:
    2402804
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
  • 批准号:
    2403408
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Toward Understandability and Interpretability for Neural Language Models of Source Code
合作研究:SHF:媒介:实现源代码神经语言模型的可理解性和可解释性
  • 批准号:
    2423813
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
  • 批准号:
    2402806
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
  • 批准号:
    2403135
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
  • 批准号:
    2403409
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
  • 批准号:
    2402805
  • 财政年份:
    2024
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: High-Performance, Verified Accelerator Programming
合作研究:SHF:中:高性能、经过验证的加速器编程
  • 批准号:
    2313024
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Verifying Deep Neural Networks with Spintronic Probabilistic Computers
合作研究:SHF:中:使用自旋电子概率计算机验证深度神经网络
  • 批准号:
    2311295
  • 财政年份:
    2023
  • 资助金额:
    $ 50万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了