Automatic Performance Tuning of Numerical Kernels

数值内核的自动性能调优

基本信息

  • 批准号:
    0090127
  • 负责人:
  • 金额:
    $ 49.77万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2001
  • 资助国家:
    美国
  • 起止时间:
    2001-07-01 至 2005-06-30
  • 项目状态:
    已结题

项目摘要

Large scale simulations in computational engineering and science often spend a great deal of their time in a few computational methods kernels, such as dense or sparse matrix-vector products, relaxation on a structured or unstructured mesh, or the computation of forces between pairs of attracting or repelling particles. There has been a great deal of work in generating high performance libraries for these applications, including dense and sparse linear algebra, multigrid methods, and n-body techniques.One idea established in these application-level libraries is to organize the computations around a set of numerical kernels, with the assumption that these kernels will be highly optimized on each of the hardware platforms of interest. The best known example of this approach is the BLAS (the Basic Linear Algebra routines), which are used in building LAPACK, ScaLAPACK, and other libraries; the BLAS are implemented by hardware vendors and are highly tuned to the memory hierarchy of each machine.However, this approach is limited by the growing number of kernels, the large number of machines, the increasing depth of memory hierarchies and complexity of processors, and by the difficulty of performance tuning each kernel on each machine. The great majority of these kernels are susceptible to large speedups when machine-specific tuning is performed. However, the hand tuning takes weeks or months of a skilled engineer's time, and this work must be repeated for each micro-architecture, or operating system change. This research will work to automate the process of architecture-dependent tuning of numerical kernels, replacing the current hand-tuning process with a semi-automated search procedure. Prototypes of this approach exist for dense matrix-multiplication (Atlas and PHiPAC), FFTs (FFTW), and sparse matrix-vector multiplication (Sparsity). These results show that we can frequently do as well as or even better than hand-tuned vendor code on the kernels attempted. These systems use a hand-written "search directed code generator (SDCG)" to produce many different implementations of a single kernel, which are all run on each architecture, with the fastest one being selected. This approach will be extended to a much wider range of numerical kernels by combing compiler technology with algorithm-specific transformation rules to automate the production of these SDCGs.Ultimately, the technology is expected to be useful in conventional compilers, provided that appropriate abstract data types or annotations are used to side-step very difficult or "impossible" dependency-analysis needed to justify the desired code transformations. This work should also stimulate research into new high level numerical methods and architectures, both of which are limited by the lack of highly tuned kernels.
计算工程和科学中的大规模模拟通常会花费大量时间在几种计算方法内核中,例如密集或稀疏的矩阵向量产品,在结构化或非结构化的网格上放松,或者在吸引或排斥颗粒的成对上的力量计算。在为这些应用程序生成高性能库方面,已经进行了大量工作,包括密集和稀疏的线性代数,多机方法和N体技术。在这些应用程序级别的库中建立的一个想法是,围绕这些数值核心组织了一组数字核,并借助这些kernels在每个方面都可以在每个核心上进行优化的平台来组织计算。这种方法的最著名示例是Blas(基本的线性代数例程),用于构建Lapack,Scalapack和其他库。 BLA由硬件供应商实现,并高度调整到每台机器的内存层次结构中。当进行机器特异性调整时,这些内核中的绝大多数都容易受到大加速度的影响。但是,手动调整需要数周或几个月的熟练工程师时间,并且必须重复每个微观架构或操作系统更改这项工作。这项研究将有助于自动化数值内核的体系结构调整过程,从而用半自动化的搜索过程替换当前的手工调节过程。该方法的原型存在用于密集的基质 - 型胶片(ATLAS和PHIPAC),FFTS(FFTW)和稀疏矩阵矢量乘数(Sparsity)。这些结果表明,与尝试的内核上的手工调整供应商代码相比,我们经常可以做到甚至更好。这些系统使用手写的“搜索定向代码生成器(SDCG)”来生成单个内核的许多不同实现,这些实现都在每个体系结构上运行,并且选择了最快的核心。通过将编译器技术与算法特定的转换规则梳理以自动化这些SDCG的生产,该方法将扩展到更广泛的数值内核,只要将技术用于常规编译器,前提是将适当的抽象数据类型或注释用于侧向核对或“不可能的“不可能”的依赖性依赖性的依赖性的依赖性的依赖性的依赖性的依赖性的规定,则该技术有用。这项工作还应刺激对新的高水平数值方法和体系结构的研究,这两者都受到缺乏高度调谐内核的限制。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Katherine Yelick其他文献

Katherine Yelick的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Katherine Yelick', 18)}}的其他基金

SPX: Collaborative Research: Global Address Programming with Accelerators
SPX:协作研究:使用加速器进行全局地址编程
  • 批准号:
    1823034
  • 财政年份:
    2018
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Student Travel Support for the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT); San Francisco, CA; October 18 - 21, 2015
第 24 届国际并行架构和编译技术会议 (PACT) 的学生差旅支持;
  • 批准号:
    1546951
  • 财政年份:
    2015
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Simulations And Analysis of Cosmic Microwave Background Polarization Data At The Petascale And Beyond
千万亿级及以上宇宙微波背景偏振数据的模拟和分析
  • 批准号:
    0905099
  • 财政年份:
    2009
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Collaborative Research: CRI: IAD: Development of a Research Infrastructure for the Multithreaded Computing Community Using the Cray Eldorado Platform
协作研究:CRI:IAD:使用 Cray Eldorado 平台为多线程计算社区开发研究基础设施
  • 批准号:
    0709254
  • 财政年份:
    2007
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant
Automated Perturbation Theory for Hamiltonian Systems
哈密​​顿系统的自动摄动理论
  • 批准号:
    9712410
  • 财政年份:
    1997
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Standard Grant
Software Systems for Irregular Application on Scalable Multiprocessors
用于可扩展多处理器上不规则应用的软件系统
  • 批准号:
    9210260
  • 财政年份:
    1992
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant

相似国自然基金

儿童时间偏好对学业和在校行为表现的长期影响及机制研究
  • 批准号:
    72303081
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
游戏是工作的对立面吗?游戏式工作对员工和团队绩效表现的影响机制研究
  • 批准号:
    72302024
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
生态移民对移民劳动力市场表现、儿童发展和代际流动的影响研究
  • 批准号:
    72303181
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
多组学分析赛马肠道微生物增强宿主运动表现的作用机制
  • 批准号:
    32360016
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
电商直播中情绪感染的表现、形成机理和绩效影响:基于动态视角的实证研究
  • 批准号:
    72302136
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Development of performance parameter optimization tools for automatic tuning
自动调优性能参数优化工具开发
  • 批准号:
    23K11126
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning
职业:从文本文档中挖掘提示来指导自动数据库性能调优
  • 批准号:
    2239326
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Continuing Grant
Autonomous Discovery of Performance Tuning Insights Using Autotuning
使用自动调优自主发现性能调优见解
  • 批准号:
    23K16890
  • 财政年份:
    2023
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
A Novel Approach to Measuring Neural Tuning to Written Words
测量对书面文字的神经调节的新方法
  • 批准号:
    10673192
  • 财政年份:
    2022
  • 资助金额:
    $ 49.77万
  • 项目类别:
Tuning mesenchymal stem cell lifespan, performance, and differentiation
调节间充质干细胞的寿命、性能和分化
  • 批准号:
    DP220101644
  • 财政年份:
    2022
  • 资助金额:
    $ 49.77万
  • 项目类别:
    Discovery Projects
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了