Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing

合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享

基本信息

  • 批准号:
    2008286
  • 负责人:
  • 金额:
    $ 20.68万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

Modern scientific research heavily relies on supercomputers. Supercomputing applications, such as traditional numerical simulations (HPC), data intensive applications (Big Data), and most recently, deep learning (DL) applications, are increasingly run on supercomputers to obtain timely results and explore new research methods that combine multiple application types. However, a bottleneck in their design reduces the potential performance of modern supercomputers. This project, bbThemis, addresses this problem by enabling efficient and policy-driven sharing of an intermediate storage layer known as a "burst buffer", so that more scientists and applications can leverage state-of-the-art storage techniques to significantly reduce their runtime and enhance the productivity of their research. This project will deliver substantial gains to almost every research area that uses HPC resources, leading to improved science and engineering methods and products in all fields. This research will have an immediate and significant impact on existing scientific applications and on deriving guidelines for next-generation HPC system design, deployment, and utilization. The project will also contribute to educational outcomes. In addition to students working directly on project goals, results developed in the project will be used in tutorial and training sessions at Texas Advanced Computing Center’s summer institute in deep learning and other major conferences, and in University of Illinois Urbana-Champaign student projects. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC.This project, bbThemis (https://github.com/bbThemis), leverages a suite of technologies, such as disassociation of I/O processing from control logic, time-sliced intra I/O node sharing, function interception for low overhead POSIX I/O, and metadata and data placement for optimal individual application performance. It is investigating how to best apply these technologies, by: 1) Identifying optimal burst buffer configurations for a suite of representative supercomputing applications; 2) Proposing, prototyping, and verifying different design options to address intra-node and inter-node I/O performance sharing; and 3) Designing and evaluating a set of sharing policies, such as fair sharing and priority sharing, with real applications and I/O traces. This project will dramatically increase the sharing capacity of existing burst buffers and enhance domain scientists’ productivity at a large scale. It explores various sharing policies that permit efficient sharing of I/O resources and that meet the requirements of computing centers. The results will enable the provisioning of I/O resources, where users can request specific IOPS or bandwidth for a period of time. The prototype burst buffer sharing framework will immediately increase the capacity of existing supercomputers with enhanced I/O performance. The lessons learned will guide next-generation I/O system design for large scale systems. The general improvement of HPC, Big Data, and DL applications will also increase the coherence of the hardware and software used for data analytics computing and modeling and simulation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代科学研究非常依赖于超级计算机。超级计算应用,如传统的数值模拟(HPC),数据密集型应用(大数据),以及最近的深度学习(DL)应用,越来越多地在超级计算机上运行,以获得及时的结果,并探索联合收割机结合多种应用类型的新研究方法。然而,其设计中的瓶颈降低了现代超级计算机的潜在性能。这个项目bbThemis通过实现被称为“突发缓冲区”的中间存储层的高效和策略驱动共享来解决这个问题,以便更多的科学家和应用程序可以利用最先进的存储技术来显着减少他们的运行时间并提高他们的研究生产力。该项目将为几乎所有使用HPC资源的研究领域带来实质性收益,从而改进所有领域的科学和工程方法和产品。这项研究将对现有的科学应用以及下一代HPC系统设计、部署和利用的指导方针产生直接而重大的影响。该项目还将促进教育成果。除了学生直接致力于项目目标外,该项目开发的成果还将用于德克萨斯州高级计算中心深度学习暑期研究所和其他主要会议的教程和培训课程,以及伊利诺伊大学厄巴纳-香槟分校的学生项目。该项目与国家战略计算计划(NSCI)保持一致,以提升美国在HPC领域的领导地位。该项目bbThemis(https://github.com/bbThemis)利用了一套技术,例如将I/O处理与控制逻辑分离、时间切片内部I/O节点共享、低开销POSIX I/O的函数拦截以及优化单个应用程序性能的元数据和数据放置。它正在研究如何最好地应用这些技术,通过:1)确定一套代表性的超级计算应用程序的最佳突发缓冲器配置; 2)提出,原型设计,并验证不同的设计选项,以解决节点内和节点间的I/O性能共享; 3)根据真实的应用和I/O轨迹,设计和评估一组共享策略,如公平共享和优先级共享。该项目将大大增加现有突发缓冲区的共享容量,并大规模提高领域科学家的生产力。 它探讨了各种共享策略,允许有效地共享I/O资源,并满足计算中心的要求。结果将支持I/O资源的调配,用户可以在一段时间内请求特定的IOPS或带宽。 原型突发缓冲区共享框架将立即增加现有超级计算机的容量,并增强I/O性能。这些经验教训将指导下一代大型系统的I/O系统设计。高性能计算、大数据和深度学习应用程序的总体改进也将提高用于数据分析计算、建模和仿真的硬件和软件的一致性。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fine-Grained Policy-Driven I/O Sharing for Burst Buffers
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Daniel Katz其他文献

HCV recurrence and death after viral clearance in HCV-viremic donor to HCV-negative kidney recipient - a case report
HCV 病毒清除后 HCV 复发和死亡在 HCV 病毒血症供体到 HCV 阴性肾受体中 - 病例报告
  • DOI:
    10.1016/j.ajt.2024.12.180
  • 发表时间:
    2025-01-01
  • 期刊:
  • 影响因子:
    8.200
  • 作者:
    Shengliang He;Sung-Hoon Kim;Tomohiro Tanaka;David Thomsen;Christie Thomas;Daniel Katz;Hassan Aziz;Alan Reed
  • 通讯作者:
    Alan Reed
Validation of Remote Administration of Social Cognitive Assessments in Pregnant Women
  • DOI:
    10.1016/j.biopsych.2021.02.558
  • 发表时间:
    2021-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Emma Smith;Danielle Torres;Deborah Li;Vignesh Rajasekaran;Margaret McClure;Daniel Katz;Julie Spicer;Nicole Derish;Antonia S. New;Erin A. Hazlett;Harold W. Koenigsberg;Maria de las Mercedes Perez-Rodriguez
  • 通讯作者:
    Maria de las Mercedes Perez-Rodriguez
375. Social Cognition in Pregnancy and Postpartum and an Association With Maternal Caregiving
  • DOI:
    10.1016/j.biopsych.2023.02.615
  • 发表时间:
    2023-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Emma Smith;Matina Kakalis;Juliana Camacho Castro;Kendall Moore;Samantha Miyares;Cristela Lopez;Sarah Garikana;Madeleine Carter;Leif Alino;Maeve McClure;Marie Balemian;Harold W. Koenigsberg;Nakiyah Knibbs;Luciana Vieira;Rebecca H. Jessel;Andres Ramirez-Zamudio;Anna Rommel;Robert Pietrzak;Veerle Bergink;Daniel Katz
  • 通讯作者:
    Daniel Katz
Quantifying Pollen Forecast Accuracy: An Assessment Of Private Sector Predictions In New York
量化花粉预报准确性:对纽约私营部门预测的评估
  • DOI:
    10.1016/j.jaci.2023.11.355
  • 发表时间:
    2024-02-01
  • 期刊:
  • 影响因子:
    11.200
  • 作者:
    Daniel Katz;Kyle Edwards;Sida Huang;Guy Robinson
  • 通讯作者:
    Guy Robinson
Ezra Pound’s Provincial Provence: Arnaut Daniel, Gavin Douglas, and the Vulgar Tongue
埃兹拉·庞德的普罗旺斯省:阿诺特·丹尼尔、加文·道格拉斯和粗俗的舌头
  • DOI:
    10.1215/00267929-1589167
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0.4
  • 作者:
    Daniel Katz
  • 通讯作者:
    Daniel Katz

Daniel Katz的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Daniel Katz', 18)}}的其他基金

Collaborative Research: EAGER: Characterizing Research Software from NSF Awards
协作研究:EAGER:获得 NSF 奖项的研究软件特征
  • 批准号:
    2211279
  • 财政年份:
    2022
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
CIF: Small: RUI: Highly Nonlinear and Pseudorandom Structures for Communications and Sensing
CIF:小:RUI:用于通信和传感的高度非线性和伪随机结构
  • 批准号:
    2206454
  • 财政年份:
    2022
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: Sustainability: A Community-Centered Approach for Supporting and Sustaining Parsl
合作研究:可持续性:以社区为中心的支持和维持 Parsl 的方法
  • 批准号:
    2209920
  • 财政年份:
    2022
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: funcX: A Function Execution Service for Portability and Performance
协作研究:框架:funcX:可移植性和性能的函数执行服务
  • 批准号:
    2004932
  • 财政年份:
    2020
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
CIF: Small: RUI: Low Correlation and Highly Nonlinear Structures for Communications and Sensing
CIF:小型:RUI:用于通信和传感的低相关性和高度非线性结构
  • 批准号:
    1815487
  • 财政年份:
    2018
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
REU Site: INCLUSION - Incubating a New Community of Leaders Using Software, Inclusion, Innovation, Interdisciplinary and OpeN-Science
REU 网站:包容性 - 利用软件、包容性、创新、跨学科和开放科学孵化新的领导者社区
  • 批准号:
    1659702
  • 财政年份:
    2017
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Kansas-Missouri-Nebraska Commutative Algebra Conference (KUMUNU 2016)
堪萨斯州-密苏里州-内布拉斯加州交换代数会议 (KUMUNU 2016)
  • 批准号:
    1645050
  • 财政年份:
    2016
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
The 4th Workshop on Sustainable Software for Science: Best Practices and Experiences (WSSSPE4)
第四届科学可持续软件研讨会:最佳实践和经验(WSSSPE4)
  • 批准号:
    1648293
  • 财政年份:
    2016
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Promoting Action to Build Research Communities in the Age of Open Science
促进开放科学时代建设研究社区的行动
  • 批准号:
    1645571
  • 财政年份:
    2016
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
RUI: Extremal Combinatorics of Patterns, Correlation, and Structure
RUI:模式、相关性和结构的极值组合
  • 批准号:
    1500856
  • 财政年份:
    2015
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403312
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402947
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403399
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403089
  • 财政年份:
    2024
  • 资助金额:
    $ 20.68万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了