SHF: Small: Empirical Autotuning of Parallel Computation for Scalable Hybrid Systems

SHF:小型:可扩展混合系统并行计算的经验自动调整

基本信息

  • 批准号:
    1527706
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-07-15 至 2019-06-30
  • 项目状态:
    已结题

项目摘要

Today, scientific and engineering computing is synonymous with parallel computing, and applications such as climate modeling, drug design, aircraft design, etc. utilize very large supercomputer installations, with power consumption measured in MegaWatts, and the cost of electricity measured in millions of dollars. At the same time, every parallel application requires some level of tuning to ensure that the software is mapped appropriately to the hardware. Otherwise, suboptimal performance can lead to lost cycles, kilowatt-hours, and, ultimately, dollars. Tuning the application by making repeated runs is also a wasteful option at very large scale. The DARE project addresses this problem by tuning the application through modeling and simulation of its behavior at very large scale, rather than actually running it. Therefore, resources required for tuning are marginal compared to those consumed in production runs. DARE is based on the observation that the same approach that replaces a wind tunnel with a computer simulation of the airfoil can be applied to the software itself. Two aspects of today's high-end computing landscape make the DARE work unique: 1) the prevalence of hardware accelerators, such as Graphics Processing Units and Xeon Phi co-processors, and 2) adoption of task-based, dynamic, work scheduling systems as an alternative to traditional, lock-step parallel programming models. In particular, DARE combines three components into a refinement loop: a hardware analysis component, a kernel modeling component, and a workload simulation component. The role of the hardware analysis component is to extract the basic hardware information, such as processing power and data link speed. The role of the kernel modeling component is to provide performance models of the serial kernels that constitute the building blocks of the parallel program. Finally, the role of the simulation component is to simulate large-scale parallel workloads.The hardware analysis component gathers the basic knowledge about the system, such as: the number of CPU sockets per shared memory node, the number of CPU cores in each socket, the cache hierarchy, existence of hyper-threading, number of NUMA nodes and proximity of CPUs to NUMA nodes, number of GPU accelerators or Xeon Phi co-processors and capacities of their device memories, and the topology and bandwidth of data links, both within each node (busses), and between nodes (network switches). Part of this knowledge can be gathered by using appropriate query APIs, such as hwloc, netloc, PAPI, and those provided in the CUDA SDK, OpenCL SDK, and Xeon Phi SDK. Synthetic tests can be used for parameters that cannot be established in this manner.Kernels are essentially the serial building blocks of parallel problems. Although kernels are usually characterized by serial control flow, most of the time they already rely on a high degree of data parallelism. Today's CPUs get most of their performance from SIMD parallelism, and GPUs get their performance from massive SIMT parallelism. The role of the kernel modeling component is two-fold: 1) to tune kernels for maximum performance at a given granularity, 2) to provide the kernel performance model as a function of granularity, which is changing to accommodate parallel execution.DARE turns to a stochastic time-stepping simulation in order to predict the performance of a dynamic runtime scheduler for two fundamental reasons: 1) Building good performance models on the basis of benchmarking actual parallel runs requires a significant number of runs with significant problem sizes, which is simply too time consuming. And 2), the impact of many tuning parameters is too complex to be modeled by sparsely sampling the tuning space and fitting simple curves / surfaces to the sample points. The answer to the problem is to replace the run with a time stepping simulation, where a given task-based scheduler is used for assigning tasks to cores, but instead of invoking actual kernel tasks, control is passed to a progress tracking simulation system, which relies on kernel performance models to simulate the execution of the tasks and produce a virtual trace of the simulated execution. The performance advantage is twofold: 1) Simulating a single run is much faster than actually making that run, and 2) Many simulations can be run in parallel allowing for fast sweeps through a large parameter search space.DARE replaces the standard waterfall autotuning process with a process that is incremental and iterative in nature. The power of the DARE approach lies in the mutual refinement loop, where each of the three phases is capable of massively pruning the search space for the other two. As a result, very high quality models can be built for a particular workload, since time is being spent refining the model for the conditions that actually apply, rather than sampling the search space in areas never touched at runtime.
今天,科学和工程计算是并行计算的同义词,诸如气候建模、药物设计、飞机设计等应用使用非常大的超级计算机装置,其功耗以兆瓦计,电力成本以数百万美元计。同时,每个并行应用程序都需要进行某种程度的调优,以确保软件适当地映射到硬件。否则,次优性能可能会导致周期损失、千瓦时损失以及最终的损失。通过重复运行来调优应用程序在非常大规模的情况下也是一种浪费的选择。DARE项目通过对应用程序进行大规模的建模和模拟来解决这个问题,而不是实际运行它。因此,与生产运行中消耗的资源相比,调优所需的资源是微不足道的。DARE是基于观察,同样的方法,取代风洞与计算机模拟的翼型可以应用到软件本身。当今高端计算领域的两个方面使DARE工作独特:1)硬件加速器的普及,如图形处理单元和Xeon Phi协处理器;2)采用基于任务的动态工作调度系统,作为传统锁步并行编程模型的替代方案。特别是,DARE将三个组件组合成一个细化循环:硬件分析组件、内核建模组件和工作负载模拟组件。硬件分析组件的作用是提取硬件的基本信息,如处理能力和数据链路速度。内核建模组件的作用是提供构成并行程序构建块的串行内核的性能模型。最后,模拟组件的作用是模拟大规模并行工作负载。硬件分析组件收集关于系统的基本知识,如:CPU插座的数量每节点共享内存,CPU核的数量在每个插座,缓存层次结构,超线程的存在,NUMA节点和邻近的CPU数量NUMA节点,GPU加速器或Xeonφ协同处理器和能力的设备记忆,和数据的拓扑结构和带宽链接,在每个节点(公交车),节点之间(网络交换机)。可以通过使用适当的查询api(如hwloc、netloc、PAPI以及CUDA SDK、OpenCL SDK和Xeon Phi SDK中提供的api)来收集这些知识的一部分。综合试验可用于不能以这种方式确定的参数。核本质上是并行问题的串行构建块。虽然内核通常以串行控制流为特征,但大多数时候它们已经依赖于高度的数据并行性。今天的cpu从SIMD并行性中获得大部分性能,gpu从大量SIMT并行性中获得性能。内核建模组件的作用有两个方面:1)在给定粒度下调优内核以获得最大性能,2)提供作为粒度函数的内核性能模型,该模型正在更改以适应并行执行。为了预测动态运行时调度器的性能,DARE转向随机时间步进模拟,有两个基本原因:1)在基准测试实际并行运行的基础上构建良好的性能模型需要大量具有重大问题规模的运行,这实在是太耗时了。2)许多调谐参数的影响太复杂,无法通过对调谐空间进行稀疏采样并将简单的曲线/曲面拟合到样本点来建模。解决这个问题的方法是用时间步进模拟取代运行,其中使用给定的基于任务的调度器将任务分配给内核,但不是调用实际的内核任务,而是将控制传递给进度跟踪模拟系统,该系统依赖内核性能模型来模拟任务的执行,并生成模拟执行的虚拟跟踪。性能优势是双重的:1)模拟单个运行比实际运行要快得多,2)许多模拟可以并行运行,允许在大参数搜索空间中快速扫描。DARE用本质上是增量和迭代的过程取代了标准的瀑布式自动调整过程。DARE方法的强大之处在于相互优化循环,其中三个阶段中的每一个阶段都能够为其他两个阶段大量修剪搜索空间。因此,可以为特定的工作负载构建非常高质量的模型,因为将时间花在为实际应用的条件改进模型上,而不是在运行时从未触及的区域中对搜索空间进行采样。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jack Dongarra其他文献

The co-evolution of computational physics and high-performance computing
计算物理与高性能计算的协同演化
  • DOI:
    10.1038/s42254-024-00750-z
  • 发表时间:
    2024-08-23
  • 期刊:
  • 影响因子:
    39.500
  • 作者:
    Jack Dongarra;David Keyes
  • 通讯作者:
    David Keyes
hipMAGMA v1.0
hipMAGMA v1.0
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cade Brown;Ahmad Abdelfattah;Stanimire Tomov;Jack Dongarra
  • 通讯作者:
    Jack Dongarra
The eigenvalue problem for Hermitian matrices with time reversal symmetry
具有时间反演对称性的 Hermitian 矩阵的特征值问题
  • DOI:
    10.1016/0024-3795(84)90068-5
  • 发表时间:
    1984
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Jack Dongarra;J. R. Gabriel;D. D. Koelling;James Hardy Wilkinson
  • 通讯作者:
    James Hardy Wilkinson
Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters
使用分层矩阵分析 BiCGStab 在 GPU 集群上的性能
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ichitaro Yamazaki;Ahmad Abdelfattah;Akihiro Ida;Satoshi Ohshima;Stanimire Tomov;Rio Yokota;Jack Dongarra
  • 通讯作者:
    Jack Dongarra
Self-healing network for scalable fault-tolerant runtime environments
  • DOI:
    10.1016/j.future.2009.04.001
  • 发表时间:
    2010-03-01
  • 期刊:
  • 影响因子:
  • 作者:
    Thara Angskun;Graham Fagg;George Bosilca;Jelena Pješivac-Grbović;Jack Dongarra
  • 通讯作者:
    Jack Dongarra

Jack Dongarra的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jack Dongarra', 18)}}的其他基金

Travel: Workshop on Clusters, Clouds, and Data Analytics for Scientific Computing 2024
旅行:2024 年科学计算集群、云和数据分析研讨会
  • 批准号:
    2336813
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds, and Data Analytics for Scientific Computing
科学计算集群、云和数据分析研讨会
  • 批准号:
    2001329
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds, and Data Analytics in Scientific Computing
科学计算中的集群、云和数据分析研讨会
  • 批准号:
    1800946
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Toward a common digital continuum platform for big data and extreme-scale computing (BDEC2)
迈向大数据和超大规模计算的通用数字连续平台 (BDEC2)
  • 批准号:
    1849625
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: ACI-CDS&E: Highly Parallel Algorithms and Architectures for Convex Optimization for Realtime Embedded Systems (CORES)
合作研究:ACI-CDS
  • 批准号:
    1709069
  • 财政年份:
    2017
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Workshop on Clusters, Clouds and Data Analytics in Scientific Computing
科学计算中的集群、云和数据分析研讨会
  • 批准号:
    1606551
  • 财政年份:
    2016
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: EMBRACE: Evolvable Methods for Benchmarking Realism through Application and Community Engagement
合作研究:拥抱:通过应用和社区参与对现实主义进行基准测试的演化方法
  • 批准号:
    1535025
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SI2-SSI: Collaborative Proposal: Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX)
SI2-SSI:协作提案:极端规模环境的性能应用程序编程接口 (PAPI-EX)
  • 批准号:
    1450429
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CSR:Medium:Collaborative Research: SparseKaffe: high-performance, auto-tuned, energy-aware algorithms for sparse direct methods on modern heterogeneous architectures
CSR:Medium:协作研究:SparseKaffe:现代异构架构上稀疏直接方法的高性能、自动调整、能量感知算法
  • 批准号:
    1514286
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
EAGER: Collaborative Research: Memristive Accelerator for Extreme Scale Linear Solvers
EAGER:协作研究:用于超大规模线性求解器的忆阻加速器
  • 批准号:
    1548093
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

The Empirical Study of Gender (EGEN) Research Network: Small Research Prizes to Graduate Students and Early Career Faculty
性别实证研究 (EGEN) 研究网络:为研究生和早期职业教师提供小型研究奖
  • 批准号:
    2215500
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Empirical Studies on Inclusiveness and Exclusiveness of Sharing of Technologies in East African Small and Medium-sized Manufacturers
东非中小型制造商技术共享包容性与排他性实证研究
  • 批准号:
    21H03706
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
RI: Small: New Directions in Probabilistic Deep Learning: Exponential Families, Bayesian Nonparametrics and Empirical Bayes
RI:小:概率深度学习的新方向:指数族、贝叶斯非参数和经验贝叶斯
  • 批准号:
    2127869
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis
NSF-BSF:RI:小型:通过形式和经验分析的高效变压器
  • 批准号:
    2113530
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
On Property Liability Insurance Demand of Small and Medium-sized Enterprises- Empirical Evidence Based on Finance and Insurance Theories-
论中小企业财产责任保险需求——基于金融保险理论的经验证据——
  • 批准号:
    20K01756
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CIF: Small: Fundamental Limits of Empirical Risk Minimization in High Dimensions: A Unifying Gaussian Processes Approach
CIF:小:高维经验风险最小化的基本限制:统一高斯过程方法
  • 批准号:
    2009030
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
OAC Core: Small: Devising Data-driven Methodologies by Employing Large-scale Empirical Data to Fingerprint, Attribute, Remediate and Analyze Internet-scale IoT Maliciousness
OAC 核心:小型:通过使用大规模经验数据来指纹识别、归因、修复和分析互联网规模的物联网恶意行为,设计数据驱动的方法
  • 批准号:
    1953051
  • 财政年份:
    2019
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Theoretical and Empirical Research on Project-Based Budgeting System for Small and Medium Civil Engineering Construction Companies
中小型土木工程施工企业项目预算制度的理论与实证研究
  • 批准号:
    19K01993
  • 财政年份:
    2019
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
OAC Core: Small: Devising Data-driven Methodologies by Employing Large-scale Empirical Data to Fingerprint, Attribute, Remediate and Analyze Internet-scale IoT Maliciousness
OAC 核心:小型:通过使用大规模经验数据来指纹识别、归因、修复和分析互联网规模的物联网恶意行为,设计数据驱动的方法
  • 批准号:
    1907821
  • 财政年份:
    2019
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
NeTS: Small: Exploring the Design, Implementation, Operation Issues of Cellular IoT via Formal Analysis and Empirical Validation
NeTS:小型:通过形式分析和实证验证探索蜂窝物联网的设计、实施和操作问题
  • 批准号:
    1814551
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了