Collaborative Research: SI2-SSI: EVOLVE: Enhancing the Open MPI Software for Next Generation Architectures and Applications

合作研究:SI2-SSI:EVOLVE:增强下一代架构和应用的开放式 MPI 软件

基本信息

  • 批准号:
    1664142
  • 负责人:
  • 金额:
    $ 156.62万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-06-01 至 2022-05-31
  • 项目状态:
    已结题

项目摘要

For nearly two decades, the Message Passing Interface (MPI) has been an essential part of the High-Performance Computing ecosystem and consequently a key enabler for important scientific breakthroughs. It is a fundamental building block for most large-scale simulations from physics, chemistry, biology, material sciences as engineering. Open MPI is an open source implementation of the MPI specification, widely used and adopted by the research community as well as industry. The Open MPI library is jointly developed and maintained by a consortium of academic institutions, national labs and industrial partners. It is installed on virtually all large-scale computer systems in the US as well as in the rest of the world. The goal of this project is to enhance and modernize the Open MPI library in the context of the ongoing evolution of modern computer systems, and to ensure its future operability on all upcoming architectures. We aim at implementing fundamental software techniques that can be used in many-core systems to execute MPI-based parallel applications more efficiently, and to tolerate process and memory failures at all scales, from current systems, up to the extreme scales expected before the end of the decade.Open MPI is an open source implementation of the Message Passing Interface (MPI) specification. The MPI API is currently being extended to consider the needs of application developers in terms of efficiency, productivity and resilience. The project will also support academic involvement in the design, development and evaluation of the Open MPI software, and ensure academic presence in the MPI Forum. The goal of this proposal is to enhance the Open MPI software library, focusing on two aspects: (1) Extend Open MPI to support new features of the MPI specification. Open MPI will continue to support all new features of current and upcoming MPI specifications. The two most significant areas within the context of this proposal are (a) extensions to better support hybrid programming models and (b) support for fault tolerance in MPI applications. To improve support for hybrid programming models, the MPI Forum is currently considering introducing the notion of MPI Endpoints, which could be used by different threads of an MPI rank to instantiate multiple separate communication contexts. The goal within this project is to develop an implementation of endpoints to support effective hybrid programming model, and to extend the concept to other aspects of parallel applications such as File I/O operations. One of the project partners (UTK) leads the current proposal in the MPI Forum to expose failures and ensure the continuation of the execution of MPI applications. In the context of this SSI proposal, the goal is to harden, improve, and expand the support of the existing ULFM implementation in Open MPI and thus enable end-users to design application-specific resilience approaches for future platforms. (2) Enhance the Open MPI core to support new architectures and improve scalability. While Open MPI has demonstrated very good scalability in the past, there is significant work to be done to ensure similarly good performance on future architectures. Specifically, we propose a groundbreaking rework of the startup environment that will improve process launch scalability, increase support for asynchronous progress of operations, enable support for accelerators, and reduce sensitivity to system noise. The project would also enhance the support for File I/O operations as part of the Open MPI package by expanding our work on highly scalable collective I/O operations through delegation and exploring the utilization of burst buffers as temporary storage.
近二十年来,消息传递接口(MPI)一直是高性能计算生态系统的重要组成部分,因此也是重要科学突破的关键推动因素。它是物理学、化学、生物学、材料科学和工程学等大多数大规模模拟的基本构建块。 Open MPI是MPI规范的开源实现,被研究界和工业界广泛使用和采用。Open MPI库由学术机构、国家实验室和工业合作伙伴联合开发和维护。它几乎安装在美国和世界其他地区的所有大型计算机系统上。该项目的目标是在现代计算机系统不断发展的背景下增强和现代化Open MPI库,并确保其未来在所有即将到来的架构上的可操作性。我们的目标是实现基本的软件技术,可用于众核系统,以执行基于MPI的并行应用程序更有效地,并容忍在所有规模的进程和内存故障,从目前的系统,到极端的规模预计在十年结束之前。开放MPI是消息传递接口(MPI)规范的开源实现。MPI API目前正在扩展,以考虑应用程序开发人员在效率、生产力和弹性方面的需求。该项目还将支持学术界参与开放式MPI软件的设计、开发和评估,并确保学术界参与MPI论坛。本提案的目标是增强Open MPI软件库,主要集中在两个方面:(1)扩展Open MPI以支持MPI规范的新功能。Open MPI将继续支持当前和即将推出的MPI规范的所有新功能。在这个建议的上下文中,两个最重要的领域是(a)扩展以更好地支持混合编程模型和(B)支持MPI应用程序中的容错。为了提高对混合编程模型的支持,MPI论坛目前正在考虑引入MPI端点的概念,MPI端点可以被MPI等级的不同线程用来实例化多个单独的通信上下文。该项目的目标是开发一个端点实现,以支持有效的混合编程模型,并将该概念扩展到并行应用程序的其他方面,如文件I/O操作。项目合作伙伴之一(UTK)在MPI论坛中领导了当前的提案,以暴露故障并确保MPI应用程序的继续执行。在这个SSI提案的背景下,目标是加强、改进和扩展Open MPI中现有ULFM实现的支持,从而使最终用户能够为未来的平台设计特定于应用程序的弹性方法。(2)增强Open MPI核心以支持新架构并提高可扩展性。虽然Open MPI在过去已经展示了非常好的可扩展性,但要确保在未来的架构上具有类似的良好性能,还有大量的工作要做。具体来说,我们提出了一个开创性的启动环境的返工,这将提高流程启动的可扩展性,增加对异步操作进度的支持,支持加速器,并降低对系统噪声的敏感性。该项目还将增强对文件I/O操作的支持,作为Open MPI包的一部分,通过委托扩展我们在高度可扩展的集体I/O操作方面的工作,并探索利用突发缓冲区作为临时存储。

项目成果

期刊论文数量(25)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A failure detector for HPC platforms
Using Advanced Vector Extensions AVX-512 for MPI Reductions
使用高级矢量扩展 AVX-512 减少 MPI
Predicting MPI Collective Communication Performance Using Machine Learning
使用机器学习预测 MPI 集体通信性能
ADAPT: an event-based adaptive collective communication framework
DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models
DeepFreeze:迈向深度学习模型的可扩展异步检查点
  • DOI:
    10.1109/ccgrid49817.2020.00-76
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nicolae, B.;Li, J.;Wozniak, J..;Bosilca, G.;Dorier, M.;Cappello, F.
  • 通讯作者:
    Cappello, F.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

George Bosilca其他文献

An evaluation of User-Level Failure Mitigation support in MPI
  • DOI:
    10.1007/s00607-013-0331-3
  • 发表时间:
    2013-05-29
  • 期刊:
  • 影响因子:
    2.800
  • 作者:
    Wesley Bland;Aurelien Bouteiller;Thomas Herault;Joshua Hursey;George Bosilca;Jack J. Dongarra
  • 通讯作者:
    Jack J. Dongarra
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors
Intel、AMD 和 Fujitsu 处理器上的批量、小型和矩形矩阵乘法的缓存优化和性能建模
Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms
  • DOI:
    10.1016/j.jpdc.2013.01.015
  • 发表时间:
    2013-07-01
  • 期刊:
  • 影响因子:
  • 作者:
    Teng Ma;George Bosilca;Aurelien Bouteiller;Jack J. Dongarra
  • 通讯作者:
    Jack J. Dongarra
Self-healing network for scalable fault-tolerant runtime environments
  • DOI:
    10.1016/j.future.2009.04.001
  • 发表时间:
    2010-03-01
  • 期刊:
  • 影响因子:
  • 作者:
    Thara Angskun;Graham Fagg;George Bosilca;Jelena Pješivac-Grbović;Jack Dongarra
  • 通讯作者:
    Jack Dongarra

George Bosilca的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('George Bosilca', 18)}}的其他基金

Collaborative Research: Frameworks: Production quality Ecosystem for Programming and Executing eXtreme-scale Applications (EPEXA)
合作研究:框架:用于编程和执行超大规模应用程序的生产质量生态系统 (EPEXA)
  • 批准号:
    1931384
  • 财政年份:
    2019
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
OAC Core: Small: Collaborative Research: Scalable Run-Time for Highly Parallel, Heterogeneous Systems
OAC 核心:小型:协作研究:高度并行、异构系统的可扩展运行时
  • 批准号:
    1909015
  • 财政年份:
    2019
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
SPX: Collaborative Research: Cross-layer Application-Aware Resilience at Extreme Scale (CAARES)
SPX:协作研究:超大规模跨层应用程序感知弹性 (CAARES)
  • 批准号:
    1725692
  • 财政年份:
    2017
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
Collaborative Research: SI2-SSI:Task-Based Environment for Scientific Simulation at Extreme Scale (TESSE)
合作研究:SI2-SSI:基于任务的超大规模科学模拟环境 (TESSE)
  • 批准号:
    1450300
  • 财政年份:
    2015
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: ADAPT: Next Generation Message Passing Interface (MPI) Library - Open MPI
SI2-SSE:协作研究:ADAPT:下一代消息传递接口 (MPI) 库 - 开放 MPI
  • 批准号:
    1339820
  • 财政年份:
    2013
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
G8 Initiative: Collaborative Research: ECS: Enabling Climate Simulation at Extreme Scale
G8 倡议:合作研究:ECS:实现极端规模的气候模拟
  • 批准号:
    1063019
  • 财政年份:
    2011
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative: CSR-AES: System Support for Auto-tuning MPI Applications
协作:CSR-AES:自动调整 MPI 应用程序的系统支持
  • 批准号:
    0720678
  • 财政年份:
    2007
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: SI2-SSI: Expanding Volunteer Computing
合作研究:SI2-SSI:扩展志愿者计算
  • 批准号:
    2039142
  • 财政年份:
    2020
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
SI2-SSI: Collaborative Research: Einstein Toolkit Community Integration and Data Exploration
SI2-SSI:协作研究:Einstein Toolkit 社区集成和数据探索
  • 批准号:
    2114580
  • 财政年份:
    2020
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: SI2-SSI: Expanding Volunteer Computing
合作研究:SI2-SSI:扩展志愿者计算
  • 批准号:
    2001752
  • 财政年份:
    2019
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
  • 批准号:
    1743178
  • 财政年份:
    2018
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
  • 批准号:
    1743185
  • 财政年份:
    2018
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
  • 批准号:
    1743180
  • 财政年份:
    2018
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
  • 批准号:
    1743179
  • 财政年份:
    2018
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
  • 批准号:
    1743191
  • 财政年份:
    2018
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Continuing Grant
Collaborative Research: SI2-SSI: Expanding Volunteer Computing
合作研究:SI2-SSI:扩展志愿者计算
  • 批准号:
    1664022
  • 财政年份:
    2017
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
Collaborative Research: SI2-SSI: Cyberinfrastructure for Advancing Hydrologic Knowledge through Collaborative Integration of Data Science, Modeling and Analysis
合作研究:SI2-SSI:通过数据科学、建模和分析的协作集成推进水文知识的网络基础设施
  • 批准号:
    1664061
  • 财政年份:
    2017
  • 资助金额:
    $ 156.62万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了