CAREER: Towards Scalable Error Detection for Parallel Software Systems on Emerging Computing Platforms

职业:在新兴计算平台上实现并行软件系统的可扩展错误检测

基本信息

项目摘要

Extreme scale computing introduces many new challenges to parallel program design, where a computation may involve hundreds of thousands of processes with multiple-level parallelism. It is very difficult to debug such large-scale parallel programs. Scalable and light-weight correctness tools are critical to combat this challenge.This research seeks to design innovative algorithms and develop a scalable toolkit to efficiently and effectively analyze parallel programs and detect potential errors on the emerging heterogeneous and extreme scale computing platforms. Specifically, the objectives of the research are to: (1) develop instrumentation tools and optimized monitoring systems to support building tools for error detection, (2) design various optimization strategies and techniques to improve scalability and reduce overhead, (3) integrate static and dynamic program analyses to improve reporting accuracy and code coverage, (4) design more accurate and efficient detection techniques on large-scale parallel systems, and (5) investigate domain-specific techniques for error detection and optimization.This research will greatly help the development of extreme scale parallel programs for scientific computing and discover hard-to-find errors in early stage. It will significantly reduce the burden of tedious debugging activities, so researchers can focus on scientific problems. The toolkit is targeted for general computing platforms, from local clusters to extreme scale supercomputers. In the education thrust, the research results will facilitate the development of new courses and enhance existing ones. High-school, undergraduate, and graduate students will have opportunities to get involved in the research.
超大规模计算给并行程序设计带来了许多新的挑战,其中计算可能涉及具有多级并行性的数十万个进程。调试这样大规模的并行程序是非常困难的。可扩展和轻量级的正确性工具是应对这一挑战的关键。本研究旨在设计创新的算法和开发一个可扩展的工具包,以有效地分析并行程序和检测潜在的错误在新兴的异构和极端规模的计算平台。具体而言,研究的目标是:(1)开发插装工具和优化的监控系统,以支持用于错误检测的构建工具,(2)设计各种优化策略和技术,以提高可伸缩性并减少开销,(3)集成静态和动态程序分析,以提高报告准确性和代码覆盖率,(4)在大规模并行系统上设计更精确和有效的检测技术,(5)研究面向特定领域的错误检测和优化技术,这将有助于科学计算中超大规模并行程序的开发,并有助于发现难以实现的错误。在早期发现错误。它将大大减轻繁琐的调试活动的负担,因此研究人员可以专注于科学问题。该工具包面向通用计算平台,从本地集群到极端规模的超级计算机。在教育方面,研究成果将促进新课程的开发和加强现有课程。高中,本科和研究生将有机会参与研究。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Liqiang Wang其他文献

Tensile propeties of in situ synthesized (TiB+LaO)/Ti composite
原位合成(TiB LaO)/Ti复合材料的拉伸性能
Protein hydrogel networks: A unique approach to heteroatom self-doped hierarchically porous carbon structures as an efficient ORR electrocatalyst in both basic and acidic conditions
蛋白质水凝胶网络:杂原子自掺杂分级多孔碳结构的独特方法作为碱性和酸性条件下的高效 ORR 电催化剂
  • DOI:
    10.1016/j.apcatb.2019.01.050
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Liqiang Wang;Kaixin Liang;Liu Deng;You-Nian Liu
  • 通讯作者:
    You-Nian Liu
FF-LINS: A Consistent Frame-to-Frame Solid-State-LiDAR-Inertial State Estimator
FF-LINS:一致的帧到帧固态激光雷达惯性状态估计器
  • DOI:
    10.1109/lra.2023.3329625
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    5.2
  • 作者:
    Hailiang Tang;Tisheng Zhang;X. Niu;Liqiang Wang;Linfu Wei;Jingnan Liu
  • 通讯作者:
    Jingnan Liu
Constraining the genesis of tungsten mineralization in the Jiaoxi deposit, Tibet: A fluid inclusion and H, O, S and Pb isotope investigation
西藏礁溪矿床钨成因的制约:流体包裹体及H、O、S、Pb同位素研究
  • DOI:
    10.1016/j.oregeorev.2021.104448
  • 发表时间:
    2021-08
  • 期刊:
  • 影响因子:
    3.3
  • 作者:
    Yong Wang;Juxing Tang;Liqiang Wang;Jan Marten Huizenga;M. Santosh
  • 通讯作者:
    M. Santosh
A modulated sparse random matrix for high-resolution and high-speed 3D compressive imaging through a multimode fiber
通过多模光纤实现高分辨率和高速 3D 压缩成像的调制稀疏随机矩阵
  • DOI:
    10.1016/j.scib.2022.03.017
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    18.9
  • 作者:
    Zhenyu Dong;Zhong Wen;Chenlei Pang;Liqiang Wang;Lan Wu;Xu Liu;Qing Yang
  • 通讯作者:
    Qing Yang

Liqiang Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Liqiang Wang', 18)}}的其他基金

ICE-T:RI: Towards End-to-End Resource Optimization for Time-Critical Computing Using Reinforcement Learning and Program Analysis
ICE-T:RI:使用强化学习和程序分析实现时间关键型计算的端到端资源优化
  • 批准号:
    1836881
  • 财政年份:
    2018
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant
RI: Medium: Collaborative Research: Understanding and Editing Visual Sentiment
RI:媒介:协作研究:理解和编辑视觉情感
  • 批准号:
    1704309
  • 财政年份:
    2017
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CSR:Small: Towards Reliable Concurrent Computing Using Hybrid Program Analysis
CSR:小:使用混合程序分析实现可靠的并发计算
  • 批准号:
    1118059
  • 财政年份:
    2011
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant
CAREER: Towards Scalable Error Detection for Parallel Software Systems on Emerging Computing Platforms
职业:在新兴计算平台上实现并行软件系统的可扩展错误检测
  • 批准号:
    1054834
  • 财政年份:
    2011
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant
Enabling Large-Scale, High-Resolution, and Real-Time Earthquake Simulations on Petascale Parallel Computers
在千万亿级并行计算机上实现大规模、高分辨率和实时地震模拟
  • 批准号:
    0941735
  • 财政年份:
    2009
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Towards Efficient and Scalable Zero-Knowledge Proofs
职业:迈向高效且可扩展的零知识证明
  • 批准号:
    2401481
  • 财政年份:
    2023
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards Efficient and Scalable Zero-Knowledge Proofs
职业:迈向高效且可扩展的零知识证明
  • 批准号:
    2144625
  • 财政年份:
    2022
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards Scalable and Robust Inference of Phylogenetic Networks
职业:走向可扩展和稳健的系统发育网络推理
  • 批准号:
    2144367
  • 财政年份:
    2022
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards Reliable Operating Systems through Scalable Control- and Data-Flow Analysis
职业:通过可扩展的控制和数据流分析实现可靠的操作系统
  • 批准号:
    2145888
  • 财政年份:
    2022
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards Scalable, Low-Power, Wide Area Networks
职业:迈向可扩展、低功耗、广域网
  • 批准号:
    2142978
  • 财政年份:
    2022
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Scalable, high-precision optoelectronic lab-on-a-chip towards next-generation precision therapeutics
职业:可扩展、高精度光电芯片实验室,致力于下一代精准治疗
  • 批准号:
    2046031
  • 财政年份:
    2021
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards a Principled Framework for Resilient, Data Efficient and Scalable Reinforcement Learning for Control
职业:建立一个有弹性、数据高效且可扩展的强化学习控制原则框架
  • 批准号:
    2045783
  • 财政年份:
    2021
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Scalable Distributed MIMO: Towards Density-Proportional Capacity Scaling for Infrastructure Wireless Networks
职业:可扩展分布式 MIMO:实现基础设施无线网络的密度比例容量扩展
  • 批准号:
    1854472
  • 财政年份:
    2018
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Continuing Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
  • 批准号:
    1925960
  • 财政年份:
    2018
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
  • 批准号:
    1651724
  • 财政年份:
    2017
  • 资助金额:
    $ 25.22万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了