Automatic Performance Tuning of Numerical Kernels

数值内核的自动性能调优

基本信息

  • 批准号:
    0090127
  • 负责人:
  • 金额:
    $ 49.77万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2001
  • 资助国家:
    美国
  • 起止时间:
    2001-07-01 至 2005-06-30
  • 项目状态:
    已结题

项目摘要

Large scale simulations in computational engineering and science often spend a great deal of their time in a few computational methods kernels, such as dense or sparse matrix-vector products, relaxation on a structured or unstructured mesh, or the computation of forces between pairs of attracting or repelling particles. There has been a great deal of work in generating high performance libraries for these applications, including dense and sparse linear algebra, multigrid methods, and n-body techniques.One idea established in these application-level libraries is to organize the computations around a set of numerical kernels, with the assumption that these kernels will be highly optimized on each of the hardware platforms of interest. The best known example of this approach is the BLAS (the Basic Linear Algebra routines), which are used in building LAPACK, ScaLAPACK, and other libraries; the BLAS are implemented by hardware vendors and are highly tuned to the memory hierarchy of each machine.However, this approach is limited by the growing number of kernels, the large number of machines, the increasing depth of memory hierarchies and complexity of processors, and by the difficulty of performance tuning each kernel on each machine. The great majority of these kernels are susceptible to large speedups when machine-specific tuning is performed. However, the hand tuning takes weeks or months of a skilled engineer's time, and this work must be repeated for each micro-architecture, or operating system change. This research will work to automate the process of architecture-dependent tuning of numerical kernels, replacing the current hand-tuning process with a semi-automated search procedure. Prototypes of this approach exist for dense matrix-multiplication (Atlas and PHiPAC), FFTs (FFTW), and sparse matrix-vector multiplication (Sparsity). These results show that we can frequently do as well as or even better than hand-tuned vendor code on the kernels attempted. These systems use a hand-written "search directed code generator (SDCG)" to produce many different implementations of a single kernel, which are all run on each architecture, with the fastest one being selected. This approach will be extended to a much wider range of numerical kernels by combing compiler technology with algorithm-specific transformation rules to automate the production of these SDCGs.Ultimately, the technology is expected to be useful in conventional compilers, provided that appropriate abstract data types or annotations are used to side-step very difficult or "impossible" dependency-analysis needed to justify the desired code transformations. This work should also stimulate research into new high level numerical methods and architectures, both of which are limited by the lack of highly tuned kernels.
计算工程和科学中的大规模模拟通常花费大量时间在一些计算方法核心上,例如密集或稀疏矩阵向量乘积,结构化或非结构化网格上的松弛,或者吸引或排斥粒子对之间的力的计算。在为这些应用程序生成高性能库方面已经做了大量的工作,包括密集和稀疏线性代数,多重网格方法和n-body techniques.在这些应用程序级库中建立的一个想法是围绕一组数值核来组织计算,假设这些核将在每个感兴趣的硬件平台上高度优化。这种方法最著名的例子是BLAS(基本线性代数例程),用于构建LAPACK、ScaLAPACK和其他库; BLAS是由硬件供应商实现的,并且高度调整到每台机器的存储器层次结构。然而,这种方法受到内核数量增长、机器数量庞大、存储器层次结构深度增长和处理器复杂性的限制,以及在每台机器上对每个内核进行性能调优的难度。当执行特定于机器的调优时,这些内核中的绝大多数都容易受到大的加速提升的影响。然而,手工调优需要熟练工程师几周或几个月的时间,并且必须为每个微架构或操作系统更改重复这项工作。这项研究将致力于自动化的数值内核的架构相关的调整过程中,取代目前的手工调整过程与半自动化的搜索过程。这种方法的原型存在于密集矩阵乘法(Atlas和PHiPAC)、FFT(FFTW)和稀疏矩阵向量乘法(Sparsity)中。这些结果表明,我们经常可以在内核上做得和手工调优的供应商代码一样好,甚至更好。这些系统使用一个手写的“搜索定向代码生成器(SDCG)”来产生单个内核的许多不同实现,这些实现都运行在每个架构上,并选择最快的一个。这种方法将被扩展到更广泛的数值内核,通过结合编译器技术与算法特定的转换规则,自动化这些SDCGs. Finally的生产,该技术预计将是有用的,在传统的编译器,提供适当的抽象数据类型或注释是用来侧步骤非常困难或“不可能”的依赖性分析需要证明所需的代码转换。这项工作也应该刺激研究新的高层次的数值方法和架构,这两者都是有限的缺乏高度调谐内核。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Katherine Yelick其他文献

Katherine Yelick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Katherine Yelick', 18)}}的其他基金

SPX: Collaborative Research: Global Address Programming with Accelerators
SPX:协作研究:使用加速器进行全局地址编程
  • 批准号:
    1823034
  • 财政年份:
    2018
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Student Travel Support for the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT); San Francisco, CA; October 18 - 21, 2015
第 24 届国际并行架构和编译技术会议 (PACT) 的学生差旅支持;
  • 批准号:
    1546951
  • 财政年份:
    2015
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Simulations And Analysis of Cosmic Microwave Background Polarization Data At The Petascale And Beyond
千万亿级及以上宇宙微波背景偏振数据的模拟和分析
  • 批准号:
    0905099
  • 财政年份:
    2009
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Collaborative Research: CRI: IAD: Development of a Research Infrastructure for the Multithreaded Computing Community Using the Cray Eldorado Platform
协作研究:CRI:IAD:使用 Cray Eldorado 平台为多线程计算社区开发研究基础设施
  • 批准号:
    0709254
  • 财政年份:
    2007
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant
Automated Perturbation Theory for Hamiltonian Systems
哈密​​顿系统的自动摄动理论
  • 批准号:
    9712410
  • 财政年份:
    1997
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Software Systems for Irregular Application on Scalable Multiprocessors
用于可扩展多处理器上不规则应用的软件系统
  • 批准号:
    9210260
  • 财政年份:
    1992
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant

相似海外基金

Development of performance parameter optimization tools for automatic tuning
自动调优性能参数优化工具开发
  • 批准号:
    23K11126
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning
职业:从文本文档中挖掘提示来指导自动数据库性能调优
  • 批准号:
    2239326
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant
Autonomous Discovery of Performance Tuning Insights Using Autotuning
使用自动调优自主发现性能调优见解
  • 批准号:
    23K16890
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Tuning the Surface Chemistry of Structured Materials for Enhanced Performance
调整结构材料的表面化学以增强性能
  • 批准号:
    RGPIN-2020-06522
  • 财政年份:
    2022
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Grants Program - Individual
Tuning mesenchymal stem cell lifespan, performance, and differentiation
调节间充质干细胞的寿命、性能和分化
  • 批准号:
    DP220101644
  • 财政年份:
    2022
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Projects
Tuning the Surface Chemistry of Structured Materials for Enhanced Performance
调整结构材料的表面化学以增强性能
  • 批准号:
    RGPIN-2020-06522
  • 财政年份:
    2021
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Grants Program - Individual
Tuning Spaces: How parametric design, computer simulation, and digital fabrication will impact the acoustic performance of architecture
调整空间:参数化设计、计算机模拟和数字制造将如何影响建筑的声学性能
  • 批准号:
    RGPIN-2016-06356
  • 财政年份:
    2021
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Grants Program - Individual
Optimising catalyst performance by tuning adsorption with light
通过光调节吸附来优化催化剂性能
  • 批准号:
    DP200102652
  • 财政年份:
    2020
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Projects
Fine-tuning branched-chain amino acid to reduced nitrogen excretion while maximizing growth performance of pigs
微调支链氨基酸以减少氮排泄,同时最大限度地提高猪的生长性能
  • 批准号:
    543632-2019
  • 财政年份:
    2020
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Collaborative Research and Development Grants
Tuning Spaces: How parametric design, computer simulation, and digital fabrication will impact the acoustic performance of architecture
调整空间:参数化设计、计算机模拟和数字制造将如何影响建筑的声学性能
  • 批准号:
    RGPIN-2016-06356
  • 财政年份:
    2020
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了