NSCI Elements: Software - PFSTRASE - A Parallel FileSystem TRacing and Analysis SErvice to Enhance Cyberinfrastructure Performance and Reliability

NSCI Elements:软件 - PFSTRASE - 用于增强网络基础设施性能和可靠性的并行文件系统跟踪和分析服务

基本信息

  • 批准号:
    1835135
  • 负责人:
  • 金额:
    $ 38.59万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-10-01 至 2022-09-30
  • 项目状态:
    已结题

项目摘要

This project will develop an open-source software service, the Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE), that improves the reliability and performance of data storage systems for the nation?s largest supercomputers. As simulations and computations represent reality more faithfully they grow commensurately in scale along with the size of the data they consume and generate. To handle the storage and movement of this data, supercomputing systems are built on the backbone of massively parallel data storage systems. Due to their parallel nature these storage systems are capable of moving data at hundreds of times the speed of conventional storage systems, enabling otherwise impractical computations. The performance capabilities these storage systems provide is accompanied by a complexity that results in them often functioning significantly less than optimally and even in some instances failing. This results in wasted computational time and ultimately lost scientific progress. The state of development of tools that could cast light on these problems and improve storage system reliability and performance is inadequate for current and future computing systems. PFSTRASE will fill this gap by continually and automatically monitoring storage system health and performance, providing insights through an easy to use interface that will improve the reliability and performance of storage and supercomputer systems. Parallel filesystems (PFSs) are the most critical high-availability components of High Performance Computing (HPC) architectures, providing input/output (I/O) services to running computations, the environment that users and system services operate in, and storage for applications and data. Because of this central role, failure or performance degradation events in the PFS impact every user of an HPC resource. PFS events must be dealt with quickly and effectively by system administrators; however, there is typically insufficient information to establish precise causal relationships between PFS activity and events, impeding the implementation of timely and targeted remedies. To fill this information gap, an open-source Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that traces and analyzes the requisite data to establish causal relationships between PFS activity and both realized and imminent events will be developed. This project will implement the service for the open-source Lustre filesystem, which is the most commonly used PFS at large-scale HPC sites. Loads for specific PFS directory and file operations will be measured and incorporated into the service to construct authentic server load contributions from every job, process, and user. The service?s infrastructure will continuously monitor the entire PFS and generate a real-time, seamless representation that connects contributions of jobs, processes, and users to storage server loads, network bandwidth, and storage capacities. The infrastructure will provide an easily navigable web interface that presents this data, both real-time and historical, in a visual format.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目将开发一个开源软件服务,并行文件系统跟踪和分析服务(PFSTRASE),提高国家数据存储系统的可靠性和性能。最大的超级计算机。 随着模拟和计算更忠实地代表现实,它们的规模沿着它们消耗和生成的数据的大小而迅速增长。 为了处理这些数据的存储和移动,超级计算系统建立在大规模并行数据存储系统的基础上。 由于它们的并行性质,这些存储系统能够以传统存储系统的数百倍的速度移动数据,从而实现否则不切实际的计算。 这些存储系统提供的性能能力伴随着复杂性,这导致它们的功能通常明显低于最佳状态,甚至在某些情况下失败。 这导致了计算时间的浪费,并最终失去了科学进步。 能够揭示这些问题并提高存储系统可靠性和性能的工具的开发状态对于当前和未来的计算系统是不充分的。 PFSTRASE将通过持续自动监控存储系统的健康和性能来填补这一空白,通过易于使用的界面提供见解,从而提高存储和超级计算机系统的可靠性和性能。并行文件系统(PFS)是高性能计算(HPC)架构中最关键的高可用性组件,为运行计算、用户和系统服务运行的环境以及应用程序和数据的存储提供输入/输出(I/O)服务。由于这一中心作用,PFS中的故障或性能下降事件会影响HPC资源的每个用户。PFS事件必须由系统管理员快速有效地处理;然而,通常没有足够的信息来建立PFS活动和事件之间的精确因果关系,这阻碍了及时和有针对性的补救措施的实施。为了填补这一信息空白,将开发一个开源的并行文件系统跟踪和分析服务(PFSTRASE),跟踪和分析必要的数据,以建立PFS活动与已实现和即将发生的事件之间的因果关系。该项目将为开源Lustre文件系统实现服务,这是大规模HPC站点最常用的PFS。特定PFS目录和文件操作的负载将被测量并合并到服务中,以构建来自每个作业、进程和用户的真实服务器负载贡献。仪式?的基础架构将持续监控整个PFS,并生成实时、无缝的表示,将作业、流程和用户的贡献与存储服务器负载、网络带宽和存储容量连接起来。该基础设施将提供一个易于导航的Web界面,以可视化格式呈现这些数据,包括实时和历史数据。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Democratizing Parallel Filesystem Monitoring
并行文件系统监控民主化
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Richard Evans其他文献

P2Y Receptor Agonists
P2Y 受体激动剂
  • DOI:
  • 发表时间:
    2001
  • 期刊:
  • 影响因子:
    0
  • 作者:
    W. Pendergast;Richard Evans
  • 通讯作者:
    Richard Evans
Why people adopt smart transportation services: an integrated model of TAM, trust and perceived risk
人们为何采用智能交通服务:TAM、信任和感知风险的集成模型
  • DOI:
    10.1080/03081060.2021.1943132
  • 发表时间:
    2021-06
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Junze Wang;Sheng Zhao;Wei Zhang;Richard Evans
  • 通讯作者:
    Richard Evans
Reflections on benchmarking NHS primary care psychological therapies and counselling
对 NHS 初级保健心理治疗和咨询基准的思考
  • DOI:
  • 发表时间:
    2006
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Mellor;M. Barkham;Geoff Mothersole;B. Mcinnes;Richard Evans
  • 通讯作者:
    Richard Evans
Effect of Adding Telephone-Based Brief Coaching to an mHealth App (Stay Strong) for Promoting Physical Activity Among Veterans: Randomized Controlled Trial (Preprint)
在移动医疗应用程序(保持坚强)中添加基于电话的简短辅导对促进退伍军人身体活动的效果:随机对照试验(预印本)
  • DOI:
    10.2196/preprints.19216
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    8.1
  • 作者:
    L. Damschroder;Lorraine R. Buis;Felicia A McCant;H. M. Kim;Richard Evans;E. Oddone;L. Bastian;Gwendolyn Hooks;Reema Kadri;Courtney White;C. Richardson;J. Gierisch
  • 通讯作者:
    J. Gierisch
Inductive general game playing
感应式一般游戏
  • DOI:
    10.1007/s10994-019-05843-w
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    7.5
  • 作者:
    Andrew Cropper;Richard Evans;Mark Law
  • 通讯作者:
    Mark Law

Richard Evans的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Richard Evans', 18)}}的其他基金

COLLABORATIVE RESEARCH: We are thriving: Challenging negative discourse through voices of women in project teams
合作研究:我们正在蓬勃发展:通过项目团队中女性的声音挑战负面言论
  • 批准号:
    2015741
  • 财政年份:
    2020
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Size, shape and surface properties in realistic models of magnetic nanocrystals
磁性纳米晶体真实模型中的尺寸、形状和表面特性
  • 批准号:
    EP/P022006/1
  • 财政年份:
    2017
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Research Grant
Mapping "missing" conformations of ATP-gated P2X receptor ion channels
绘制 ATP 门控 P2X 受体离子通道“缺失”构象图
  • 批准号:
    BB/P001076/1
  • 财政年份:
    2016
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Research Grant
Cross-linking and molecular modelling to determine the structure and dynamics of the intracellular regions of ATP gated P2X receptor ion channels
交联和分子建模以确定 ATP 门控 P2X 受体离子通道细胞内区域的结构和动力学
  • 批准号:
    BB/M000990/1
  • 财政年份:
    2014
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Research Grant
Integrated mutagenesis, bio-informatic and fluorescence approaches to characterize the molecular basis of antagonist action at P2X7 receptors for ATP
综合诱变、生物信息和荧光方法来表征 ATP P2X7 受体拮抗剂作用的分子基础
  • 批准号:
    MR/K027018/1
  • 财政年份:
    2013
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Research Grant
Mathematics Teacher Development in Central and Northern New Hampshire
新罕布什尔州中部和北部的数学教师发展
  • 批准号:
    8470632
  • 财政年份:
    1985
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Minority Institutions Science Improvement Program-Individual Institutional Project
少数民族机构科学进步计划-个别机构项目
  • 批准号:
    7419640
  • 财政年份:
    1974
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: Elements: Software: NSCI: Chrono-An open-source simulation platform for computational dynamics problems
合作研究:要素:软件:NSCI:Chrono-计算动力学问题的开源仿真平台
  • 批准号:
    1835727
  • 财政年份:
    2019
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Elements: NSCI-Software -- A General and Effective B-Spline R-Matrix Package for Charged-Particle and Photon Collisions with Atoms, Ions, and Molecules
元素:NSCI 软件——用于带电粒子和光子与原子、离子和分子碰撞的通用且有效的 B 样条 R 矩阵包
  • 批准号:
    1834740
  • 财政年份:
    2019
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Elements: Software: NSCI: A high performance suite of SVD related solvers for machine learning
要素: 软件:NSCI:用于机器学习的 SVD 相关求解器的高性能套件
  • 批准号:
    1835821
  • 财政年份:
    2019
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets
合作研究:要素:软件:NSCI:HDR:构建 HPC/HTC 基础设施以合成和分析当前和未来的宇宙微波背景数据集
  • 批准号:
    1835526
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: Software NSCI: Constitutive Relation Inference Toolkit (CRIKit)
协作研究:元素:软件 NSCI:本构关系推理工具包 (CRIKit)
  • 批准号:
    1835825
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Elements: Software: NSCI: Efficient GPU Enabled QM/MM Calculations: AMBER Coupled with QUICK
要素: 软件:NSCI:支持高效 GPU 的 QM/MM 计算:AMBER 与 QUICK 相结合
  • 批准号:
    1835144
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Elements: Software: NSCI: A Quantum Electromagnetics Simulation Toolbox (QuEST) for Active Heterogeneous Media by Design
要素: 软件:NSCI:用于主动异质介质设计的量子电磁仿真工具箱 (QuEST)
  • 批准号:
    1835267
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: Software: NSCI: Constitutive Relation Inference Toolkit (CRIKit)
协作研究:要素:软件:NSCI:本构关系推理工具包 (CRIKit)
  • 批准号:
    1835792
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets
合作研究:要素:软件:NSCI:HDR:构建 HPC/HTC 基础设施以合成和分析当前和未来的宇宙微波背景数据集
  • 批准号:
    1835768
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets
合作研究:要素:软件:NSCI:HDR:构建 HPC/HTC 基础设施以合成和分析当前和未来的宇宙微波背景数据集
  • 批准号:
    1835865
  • 财政年份:
    2018
  • 资助金额:
    $ 38.59万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了