XPS: EXPL: CCA: Collaborative Research: Nixing Scale Bugs in HPC Applications

XPS:EXPL:CCA:协作研究:消除 HPC 应用程序中的规模错误

基本信息

  • 批准号:
    1438963
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-09-01 至 2017-08-31
  • 项目状态:
    已结题

项目摘要

Large-scale simulation is a fundamental component of modern science and engineering. Unfortunately, programs written to perform simulations on large-scale parallel computers frequently suffer from software defects that result from the sheer scale and the variety of parallelization approaches employed. Especially egregious are software bugs that occur when large resource allocations (e.g., memory requests) are made. Formally based active-testing techniques are essential to locate such defects. However, these testing tools are themselves seldom run on parallel machines, let alone at large scale, making it difficult and very time consuming to find scale bugs with high assurance. Efforts to parallelize verification tools should reuse existing technology for easy parallelization, result collection, and fault handling. Key innovations of this project include the insight that large-scale verification runs can be described through work-flows, which makes it possible to take advantage of already available distributed computing platforms, in particular Swift/T from Argonne. The complementary backgrounds of the PIs are well matched with the need to push both formal aspects and distributed verification in the context of three widely-used concurrency models, namely MPI, OpenMP, and CUDA. This work will help create a public distributed formal active testing framework. The tools and case-study software driving this research will be maintained by the PIs and released freely under open-source licenses through websites and repositories. They will facilitate large-scale debugging of scientific simulation codes by researchers and software developers in academia, government labs, and industry. The project will also generate pedagogical material and best practices, helping educate students in the use of existing work-flow based problem solving approaches. It will help train present and future scientists, engineers, and programmers, thus assisting in maintaining our nation's leadership in computing, homeland and energy security, and STEM education.
大规模仿真是现代科学与工程的一个基本组成部分。不幸的是,为了在大型并行计算机上执行模拟而编写的程序经常遭受软件缺陷的困扰,这些缺陷是由于所采用的并行化方法的规模和多样性造成的。尤其严重的是在进行大量资源分配(例如,内存请求)时发生的软件错误。基于正式的主动测试技术对于定位此类缺陷至关重要。然而,这些测试工具本身很少在并行机器上运行,更不用说在大规模上运行了,这使得在高保证的情况下发现大规模错误变得非常困难和耗时。并行化验证工具的努力应该重用现有的技术,以方便并行化、结果收集和故障处理。该项目的关键创新包括可以通过工作流描述大规模验证运行的洞察力,这使得利用已经可用的分布式计算平台成为可能,特别是来自Argonne的Swift/T。pi的互补背景很好地匹配了在三种广泛使用的并发模型(即MPI, OpenMP和CUDA)的背景下推动形式化方面和分布式验证的需求。这项工作将有助于创建一个公共的、分布式的、正式的主动测试框架。推动这项研究的工具和案例研究软件将由pi维护,并在开源许可下通过网站和存储库免费发布。它们将促进学术界、政府实验室和工业的研究人员和软件开发人员对科学模拟代码的大规模调试。该项目还将产生教学材料和最佳实践,帮助教育学生使用现有的基于工作流程的问题解决方法。它将有助于培养现在和未来的科学家、工程师和程序员,从而帮助保持我们国家在计算、国土和能源安全以及STEM教育方面的领导地位。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Martin Burtscher其他文献

Real-Time Synthesis of Compression Algorithms for Scientific Data
科学数据压缩算法的实时综合
Exploring last n value prediction
探索最后的 n 值预测
Progress toward Accelogic compression in ROOT
ROOT 中 Accelogic 压缩的进展
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    P. Canal;J. Lauret;J. González;G. Buren;I. Cali;R. Nunez;Y. Ying;Martin Burtscher
  • 通讯作者:
    Martin Burtscher
Higher-order and tuple-based massively-parallel prefix sums
高阶和​​基于元组的大规模并行前缀和
Using general-purpose processor cores as prefetching engines in chip multiprocessor architectures
使用通用处理器内核作为芯片多处理器架构中的预取引擎
  • DOI:
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Martin Burtscher;I. Ganusov
  • 通讯作者:
    I. Ganusov

Martin Burtscher的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Martin Burtscher', 18)}}的其他基金

Collaborative Research: SHF: Medium: Practical and Rigorous Correctness Checking and Correctness Preservation for Irregular Parallel Programs
合作研究:SHF:Medium:不规则并行程序的实用且严格的正确性检查和正确性保持
  • 批准号:
    1955367
  • 财政年份:
    2020
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
CSR: Medium: Collaborative Research: Programming Abstractions and Systems Support for GPU-Based Acceleration of Irregular Applications
CSR:媒介:协作研究:基于 GPU 的不规则应用加速的编程抽象和系统支持
  • 批准号:
    1406304
  • 财政年份:
    2014
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Real-Time Unobtrusive Tracing in Multicore Embedded Systems
CSR:小型:协作研究:多核嵌入式系统中的实时非侵入式跟踪
  • 批准号:
    1217231
  • 财政年份:
    2012
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
ITR: A High-Performance Compression Infrastructure for Extended Program Traces
ITR:用于扩展程序跟踪的高性能压缩基础设施
  • 批准号:
    0312966
  • 财政年份:
    2003
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: Affinity Directed Mobility for Location-Independent Data Access
协作研究:用于位置无关数据访问的亲和定向移动性
  • 批准号:
    0125987
  • 财政年份:
    2002
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Next-Generation Load-Value Predictors
下一代负载值预测器
  • 批准号:
    0208567
  • 财政年份:
    2002
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似海外基金

XPS: EXPL: FP: Collaborative Research: SPANDAN: Scalable Parallel Algorithms for Network Dynamics Analysis
XPS:EXPL:FP:协作研究:SPANDAN:用于网络动态分析的可扩展并行算法
  • 批准号:
    1924486
  • 财政年份:
    2018
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: Enabling An Ecosystem of Parallel Programming Abstractions
XPS:EXPL:启用并行编程抽象生态系统
  • 批准号:
    1628929
  • 财政年份:
    2016
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: Cache Management for Data Parallel Architecture
XPS:EXPL:数据并行架构的缓存管理
  • 批准号:
    1628401
  • 财政年份:
    2016
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: Hippogriff: Efficient Heterogeneous Servers for Data Centers and Cloud Services
XPS:EXPL:Hippogriff:用于数据中心和云服务的高效异构服务器
  • 批准号:
    1629395
  • 财政年份:
    2016
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: Exploring the Design Space of Augmented Memory Controllers with Native Support for In-Memory Data Storage
XPS:EXPL:探索具有内存数据存储本机支持的增强型内存控制器的设计空间
  • 批准号:
    1629201
  • 财政年份:
    2016
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: Write Locality Theory and Optimization for Hybrid Memory
XPS:EXPL:混合内存的写入局部性理论和优化
  • 批准号:
    1629376
  • 财政年份:
    2016
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: DSD: A Memristive Hardware Platform for Large Scale Combinatorial Optimization
XPS:EXPL:DSD:用于大规模组合优化的忆阻硬件平台
  • 批准号:
    1533762
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: CCA: Verification and Optimization Tools for Heterogeneous Memory Consistency Models
XPS:EXPL:CCA:异构内存一致性模型的验证和优化工具
  • 批准号:
    1533837
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
AitF: EXPL: Collaborative Research: Approximate Discrete Programming for Real-Time Systems
AitF:EXPL:协作研究:实时系统的近似离散编程
  • 批准号:
    1535902
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
XPS: EXPL: FP: Symmetric Queries as a Building Block for Efficient Parallel Query Evaluation
XPS:EXPL:FP:对称查询作为高效并行查询评估的构建块
  • 批准号:
    1606557
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了