Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability

合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性

基本信息

  • 批准号:
    2151022
  • 负责人:
  • 金额:
    $ 7.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-01 至 2024-05-31
  • 项目状态:
    已结题

项目摘要

Advances throughout science and engineering have for several decades been driven by High Performance Computing (HPC), with the pace of discovery accelerating in concert with continued innovation in computing capabilities. But as semiconductor technology now faces fundamental physical limits, even while large-scale systems are reaching warehouse scales, new approaches are becoming essential to achieving efficient use of computing resources. In particular, given this divergence of scales, HPC systems have necessarily become more distributed and asynchronous (in the sense that system clocks are asynchronous), resulting in increasingly variable and unpredictable execution. While these effects are recognized as critical hindrances to HPC performance, the mechanisms are not yet fully understood. What is known, however, is that much HPC infrastructure is tasked with dealing with inefficiency derived from asynchrony, variability, and unpredictability, leading to a deep and complex hardware/software support stack. The project team's hypothesis is that while each stack element provides a local solution, it may also exacerbate the global problem: that complexity has resulted in more variability, not less, and made determining its causes more difficult. This project explores the possibility of reversing the trend of ever-increasing complexity by removing and simplifying support layers. This strategy’s achievable gains remain limited, however, while the underlying cause, execution asynchrony, remains unaddressed. The approach begins by leveraging recently developed technology that enables clocks to remain extremely accurate even when distributed on a planetary scale. Such accurate, distributed clocks serve to underpin a virtuous cycle where synchrony establishes baseline predictability, which, in turn, reduces variability, and at each stage of the cycle enables reduction in the complexity of the support stack. A benefit of this approach is that the individual steps are largely simple and can be applied directly to existing software systems. This one-year project aims to obtain early findings and practical demonstrations for the importance of synchrony and predictability to increase HPC compute efficiency and thereby improve large-scale program execution. Five tasks are conducted. The first is to demonstrate the feasibility of accurate clock distribution by augmenting existing HPC network infrastructure. The second is to demonstrate the application of synchrony in the establishing a virtuous cycle enabling simplifications to the software/system support stack. The third is to devise mechanisms to model, measure, and validate systems using the proposed methods. The fourth is to investigate the relative benefits of applying the synchrony-based virtuous cycle with respect to various application classes. The fifth is to demonstrate the overall efficacy of the proposed approach through a case study involving a production application. Overall, the project works to determine whether added synchronization through accurate clocks enables significant improvements to HPC computations in terms of how efficiently they use computational resources.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
几十年来,整个科学和工程的进步都是由高性能计算(HPC)推动的,随着计算能力的不断创新,发现的步伐也在加快。但是,由于半导体技术现在面临着基本的物理限制,即使在大规模系统达到仓库规模的时候,新方法对于实现有效利用计算资源也变得至关重要。特别是,考虑到这种规模的差异,HPC系统必然变得更加分布式和异步(从系统时钟是异步的意义上说),从而导致越来越多的可变和不可预测的执行。虽然这些影响被认为是高性能计算性能的关键障碍,但其机制尚未完全了解。然而,众所周知的是,许多HPC基础设施的任务是处理由异步、可变性和不可预测性引起的低效率,这导致了一个深刻而复杂的硬件/软件支持堆栈。项目团队的假设是,虽然每个堆栈元素提供了一个局部解决方案,但它也可能加剧全局问题:复杂性导致了更多的可变性,而不是更少,并使确定其原因变得更加困难。该项目探索了通过移除和简化支持层来扭转日益增加的复杂性趋势的可能性。然而,该策略可实现的收益仍然有限,而根本原因(执行异步)仍然没有得到解决。该方法首先利用最近开发的技术,使时钟即使在全球范围内分布也能保持极其精确。这种精确的分布式时钟支持良性循环,其中同步建立基线可预测性,从而减少可变性,并且在循环的每个阶段都可以减少支持堆栈的复杂性。这种方法的一个好处是,各个步骤在很大程度上是简单的,并且可以直接应用于现有的软件系统。这个为期一年的项目旨在获得同步和可预测性对提高HPC计算效率的重要性的早期发现和实际演示,从而改善大规模程序的执行。执行五项任务。首先是通过增强现有的高性能计算网络基础设施来证明精确时钟分配的可行性。第二是演示同步在建立良性循环中的应用,从而简化软件/系统支持堆栈。第三个是设计机制来使用建议的方法建模、测量和验证系统。第四是研究应用基于同步的良性循环对各种应用程序类的相对好处。第五步是通过涉及生产应用程序的案例研究来证明所建议方法的总体功效。总的来说,该项目致力于确定通过精确时钟添加的同步是否能够在使用计算资源的效率方面显著改善HPC计算。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amanda Bienz其他文献

Node aware sparse matrix-vector multiplication
节点感知稀疏矩阵向量乘法
A More Scalable Sparse Dynamic Data Exchange
更具可扩展性的稀疏动态数据交换
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrew Geyko;Gerald Collom;Derek Schafer;Patrick Bridges;Amanda Bienz
  • 通讯作者:
    Amanda Bienz
A Locality-Aware Bruck Allgather
具有地域意识的布鲁克·阿尔盖特 (Bruck Allgather)
Modeling Data Movement Performance on Heterogeneous Architectures
对异构架构上的数据移动性能进行建模
MPI Advance : Open-Source Message Passing Optimizations
MPI Advance:开源消息传递优化
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amanda Bienz;Derek Schafer;A. Skjellum
  • 通讯作者:
    A. Skjellum

Amanda Bienz的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Amanda Bienz', 18)}}的其他基金

CAREER : Towards Exascale Performance of Parallel Applications
职业:迈向并行应用的百亿亿级性能
  • 批准号:
    2338077
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Continuing Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: EAGER: The next crisis for coral reefs is how to study vanishing coral species; AUVs equipped with AI may be the only tool for the job
合作研究:EAGER:珊瑚礁的下一个危机是如何研究正在消失的珊瑚物种;
  • 批准号:
    2333604
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
  • 批准号:
    2347624
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: Revealing the Physical Mechanisms Underlying the Extraordinary Stability of Flying Insects
EAGER/合作研究:揭示飞行昆虫非凡稳定性的物理机制
  • 批准号:
    2344215
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345581
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345582
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345583
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Energy for persistent sensing of carbon dioxide under near shore waves.
合作研究:EAGER:近岸波浪下持续感知二氧化碳的能量。
  • 批准号:
    2339062
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
  • 批准号:
    2409395
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: The next crisis for coral reefs is how to study vanishing coral species; AUVs equipped with AI may be the only tool for the job
合作研究:EAGER:珊瑚礁的下一个危机是如何研究正在消失的珊瑚物种;
  • 批准号:
    2333603
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
  • 批准号:
    2347623
  • 财政年份:
    2024
  • 资助金额:
    $ 7.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了