Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing

合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享

基本信息

  • 批准号:
    2008388
  • 负责人:
  • 金额:
    $ 29.29万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

Modern scientific research heavily relies on supercomputers. Supercomputing applications, such as traditional numerical simulations (HPC), data intensive applications (Big Data), and most recently, deep learning (DL) applications, are increasingly run on supercomputers to obtain timely results and explore new research methods that combine multiple application types. However, a bottleneck in their design reduces the potential performance of modern supercomputers. This project, bbThemis, addresses this problem by enabling efficient and policy-driven sharing of an intermediate storage layer known as a "burst buffer", so that more scientists and applications can leverage state-of-the-art storage techniques to significantly reduce their runtime and enhance the productivity of their research. This project will deliver substantial gains to almost every research area that uses HPC resources, leading to improved science and engineering methods and products in all fields. This research will have an immediate and significant impact on existing scientific applications and on deriving guidelines for next-generation HPC system design, deployment, and utilization. The project will also contribute to educational outcomes. In addition to students working directly on project goals, results developed in the project will be used in tutorial and training sessions at Texas Advanced Computing Center’s summer institute in deep learning and other major conferences, and in University of Illinois Urbana-Champaign student projects. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC.This project, bbThemis (https://github.com/bbThemis), leverages a suite of technologies, such as disassociation of I/O processing from control logic, time-sliced intra I/O node sharing, function interception for low overhead POSIX I/O, and metadata and data placement for optimal individual application performance. It is investigating how to best apply these technologies, by: 1) Identifying optimal burst buffer configurations for a suite of representative supercomputing applications; 2) Proposing, prototyping, and verifying different design options to address intra-node and inter-node I/O performance sharing; and 3) Designing and evaluating a set of sharing policies, such as fair sharing and priority sharing, with real applications and I/O traces. This project will dramatically increase the sharing capacity of existing burst buffers and enhance domain scientists’ productivity at a large scale. It explores various sharing policies that permit efficient sharing of I/O resources and that meet the requirements of computing centers. The results will enable the provisioning of I/O resources, where users can request specific IOPS or bandwidth for a period of time. The prototype burst buffer sharing framework will immediately increase the capacity of existing supercomputers with enhanced I/O performance. The lessons learned will guide next-generation I/O system design for large scale systems. The general improvement of HPC, Big Data, and DL applications will also increase the coherence of the hardware and software used for data analytics computing and modeling and simulation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代科学研究在很大程度上依赖超级计算机。诸如传统的数值模拟(HPC)、数据密集型应用(大数据)以及最近的深度学习(DL)应用等超级计算应用越来越多地在超级计算机上运行,以获得及时的结果并探索结合多种应用类型的新的研究方法。然而,它们设计中的一个瓶颈降低了现代超级计算机的潜在性能。这项名为bbThemis的项目解决了这个问题,它实现了被称为“突发缓冲区”的中间存储层的高效和策略驱动的共享,以便更多的科学家和应用程序可以利用最先进的存储技术来显著缩短他们的运行时间并提高他们的研究效率。该项目将为几乎所有使用高性能计算资源的研究领域带来实质性成果,从而改进所有领域的科学和工程方法和产品。这项研究将对现有的科学应用以及下一代HPC系统设计、部署和使用的指导方针产生直接和重大的影响。该项目还将为教育成果做出贡献。除了学生直接致力于项目目标外,该项目开发的成果还将用于德克萨斯州高级计算中心夏季深度学习学院和其他大型会议的教程和培训课程,以及伊利诺伊大学厄巴纳-香槟分校的学生项目。该项目与国家战略计算倡议(NSCI)保持一致,以推动美国在HPC方面的领导地位。该项目bbThemis(https://github.com/bbThemis),)利用了一套技术,例如I/O处理与控制逻辑的分离、时间片内I/O节点共享、用于低开销POSIXI/O的功能拦截,以及用于优化单个应用程序性能的元数据和数据放置。它正在研究如何最好地应用这些技术,方法是:1)为一组典型的超级计算应用程序确定最佳的突发缓冲区配置;2)提出、制作原型和验证不同的设计选项,以解决节点内和节点间的I/O性能共享;以及3)设计和评估一组共享策略,如公平共享和优先级共享,以及真实应用程序和I/O跟踪。该项目将极大地提高现有突发缓存的共享能力,并大规模提高领域科学家的工作效率。它探索了各种共享策略,允许高效共享I/O资源,并满足计算中心的要求。结果将支持I/O资源的调配,用户可以在一段时间内请求特定的IOPS或带宽。原型突发缓冲区共享框架将立即增加现有超级计算机的容量,并增强I/O性能。吸取的经验教训将指导下一代大规模系统的I/O系统设计。HPC、大数据和数字图书馆应用程序的全面改进还将提高用于数据分析计算、建模和模拟的硬件和软件的一致性。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fine-Grained Policy-Driven I/O Sharing for Burst Buffers
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhao Zhang其他文献

A 1-V 5.2–5.7 GHz low noise sub-sampling phase locked loop in 0.18 μm CMOS
采用 0.18 μm CMOS 封装的 1V 5.2–5.7 GHz 低噪声子采样锁相环
Probe-Type Microforce Sensor for Mirco/Nano Experimental Mechanics
用于微/纳米实验力学的探针式微力传感器
  • DOI:
    10.4028/www.scientific.net/amr.33-37.943
  • 发表时间:
    2008-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xide Li;Zhao Zhang
  • 通讯作者:
    Zhao Zhang
3D trajectory tracking control of an underactuated AUV based on adaptive neural network dynamic surface
基于自适应神经网络动态面的欠驱动AUV 3D轨迹跟踪控制
An efficient and convenient formal synthesis of Jaspine B from D-xylose.
由 D-木糖高效、便捷地正式合成 Jaspine B。
  • DOI:
    10.1016/j.carres.2012.01.013
  • 发表时间:
    2012-04
  • 期刊:
  • 影响因子:
    3.1
  • 作者:
    Zhao Zhang;Yu-Tao Zhao;Wen Qu;Hong-Min Liu
  • 通讯作者:
    Hong-Min Liu
Development of a Procedure for Prioritizing Intersection Improvement Projects Considering Safety and Operational Factors
制定考虑安全和运营因素的交叉口改善项目优先顺序的程序
  • DOI:
    10.1080/19439962.2011.614374
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhenyu Wang;Zhao Zhang;J. Lu;Jianyou Zhao
  • 通讯作者:
    Jianyou Zhao

Zhao Zhang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhao Zhang', 18)}}的其他基金

CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
  • 批准号:
    2340011
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
  • 批准号:
    2411294
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2312689
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2401244
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
  • 批准号:
    2311766
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
  • 批准号:
    2401246
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
  • 批准号:
    2401245
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
  • 批准号:
    2106661
  • 财政年份:
    2021
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
  • 批准号:
    1643271
  • 财政年份:
    2016
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
  • 批准号:
    1514229
  • 财政年份:
    2015
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403312
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402947
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403399
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403089
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了