权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SI2-SSE: BONSAI: An Open Software Infrastructure for Parallel Autotuning of Computational Kernels

SI2-SSE：BONSAI：用于计算内核并行自动调整的开放软件基础设施

基本信息

批准号：
1642441
负责人：
Jakub Kurzak
金额：
$ 50万
依托单位：
University of Tennessee Knoxville
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-11-01 至 2019-10-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1642441&HistoricalAwards=false
关键词：
SI2 SSE BONSAI Open Software

项目摘要

Most supercomputers today accelerate the computations by using processors with many cores to solve important problems in science and engineering. Although this reduces the cost of the hardware system, it greatly increases the complexity of writing and optimizing ("tuning") software. This project extends a previously funded NSF project: Benchtesting Environment for Automated Software Tuning (BEAST) program to create a software toolkit that allows for semi-automatic optimization of software, thereby reducing the programming overhead. This project, BEAST OpeN Software Autotuning Infrastructure (BONSAI) will greatly increase the efficiency of scientists and engineers to develop fast and efficient programs to solve their problems. BONSAI has tremendous support from various computer processor manufacturing companies and academic institutions. The BONSAI system will be available as open-source software for academic and commercial use and many students will be trained in using the software.The emergence and growing dominance of hybrid systems that incorporate accelerator processors, such as GPUs and coprocessors, have made it far more difficult to optimize the performance of the different computational kernels that do the majority of the work in most research applications. The BONSAI project aims to create a transformative solution to this problem by developing a software infrastructure that uses parallel hybrid machines to enable large autotuning sweeps on computational kernels for GPU accelerators and many-core coprocessors. The system will go beyond just measuring runtimes, allowing for collection and analysis of non-trivial amount of data from hardware performance counters and power meters. The system will have a modular architecture and rely on standard data formats and interfaces to easily connect with mainstream tools for data analytics and visualization. The BONSAI project will leverage the experiences of the BEAST project, which established a successful autotuning methodology and validated an autotuning workflow. BONSAI will equip the community with a software environment for applying parallel resources to the tuning and performance analysis of computational kernels. Specifically, the work will be organized around the following objectives: (1) Harden and extend the programming language called BeastLang, which was created during prior research as a way of defining the search space that the autotuning infrastructure generates and explores. BeastLang enables users to create parameterized kernel specifications that encode the interplay between the kernels themselves, the compilation tools, and the target hardware. It will be integrated with the other components of BONSAI, have its Python syntax enhanced and extended, its compiler improved, and be supplemented with a runtime that supports it with multi-way parallelism for the autotuning process. (2) Develop and test a benchtesting engine for making large scale parallel tuning sweeps, using large numbers of GPU accelerators or many-core coprocessors. This engine will support both parallel compilation and parallel tests of the resulting kernels, using many distributed memory nodes and multithreading within each node, with dynamic load balancing. It will produce an extensive collection of performance information from hardware counters, and possibly energy meters, as well as collection of information about the saturation of the compiled code with different classes of instructions. (3) Develop and test a software infrastructure for collecting, preprocessing, and analyzing BONSAI performance data. The system will a) simplify the task of instrumenting the kernel and provide a simple interface for selecting the metrics to be collected with sensible defaults; b) simplify the process of collecting hardware counters and performance data from various open source and vendor specific libraries; and c) provide tools that allow the user to quickly and efficiently transform output data to a format that can be easily read and analyzed using mainstream tools such as R and Python. (4) Document and illustrate the process of using BONSAI to tune various different types of kernels. These model case studies will include discussions of how BeastLang was applied to create the parameterized kernel stencil, how the parallel benchtesting engine is invoked to generate and explore the search space, and how the data collected from the operation of the engine can be analyzed and visualized to gain insights that can correct or refine the process for another iteration. The BONSAI project has the potential to fundamentally transform autotuning research by: 1) Making autotuning accessible to a broad audience of developers from a broad range of computing disciplines, as opposed to a few selected individuals with the wizardry to set up a successful experiment within the confines of serial execution. 2) Changing the general perception of autotuning as not just the means of producing fast code, but as a general technique for performance analysis and reasoning about the complex software and hardware interactions, and positioning the technique as one of primary tools for hardware-software co-design. 3) Boosting interest in exploring neglected avenues of computing, such as exploration of unorthodox data layouts, and challenge the status quo of legacy software interfaces. BONSAI has the potential to bring autotuning to the forefront of software development and to help position autotuning as a pillar of software engineering.

今天，大多数超级计算机通过使用多核处理器来加速计算，以解决科学和工程中的重要问题。尽管这降低了硬件系统的成本，但它极大地增加了编写和优化（“调优”）软件的复杂性。该项目扩展了先前资助的NSF项目：用于自动软件调优（BEAST）程序的基准测试环境，以创建一个允许软件半自动优化的软件工具包，从而减少编程开销。这个项目，野兽开放软件自动调谐基础设施（BONSAI）将大大提高科学家和工程师开发快速有效的程序来解决他们的问题的效率。盆景得到了各计算机处理器制造公司和学术机构的大力支持。BONSAI系统将作为开源软件提供给学术和商业用途，许多学生将接受使用该软件的培训。集成了加速处理器（如gpu和协处理器）的混合系统的出现和日益占主导地位，使得在大多数研究应用中完成大部分工作的不同计算内核的性能优化变得更加困难。BONSAI项目旨在通过开发一种软件基础设施来创建一个变革性的解决方案，该软件基础设施使用并行混合机器在GPU加速器和多核协处理器的计算内核上实现大型自动调整扫描。该系统不仅可以测量运行时间，还可以收集和分析来自硬件性能计数器和功率表的大量数据。该系统将采用模块化架构，并依赖标准数据格式和接口，以便与主流数据分析和可视化工具轻松连接。BONSAI项目将利用BEAST项目的经验，BEAST项目建立了一个成功的自动调优方法，并验证了自动调优工作流程。BONSAI将为社区提供一个软件环境，用于将并行资源应用于计算内核的调优和性能分析。具体来说，这项工作将围绕以下目标进行组织：(1)加强和扩展称为BeastLang的编程语言，BeastLang是在先前的研究中创建的，用于定义自动调优基础设施生成和探索的搜索空间。BeastLang使用户能够创建参数化的内核规范，对内核本身、编译工具和目标硬件之间的相互作用进行编码。它将与BONSAI的其他组件集成，Python语法得到增强和扩展，编译器得到改进，并补充了一个运行时，该运行时支持自动调优过程的多路并行性。(2)开发和测试基准测试引擎，使用大量GPU加速器或多核协处理器进行大规模并行调优扫描。该引擎将支持并行编译和结果内核的并行测试，使用许多分布式内存节点和每个节点内的多线程，并具有动态负载平衡。它将从硬件计数器（可能还有电能表）产生大量的性能信息，以及关于使用不同指令类的编译代码的饱和的信息。(3)开发和测试用于收集、预处理和分析BONSAI性能数据的软件基础设施。该系统将a)简化检测内核的任务，并提供一个简单的接口，用于选择要收集的指标和合理的默认值；B)简化从各种开源和供应商特定库收集硬件计数器和性能数据的过程；c)提供工具，允许用户快速有效地将输出数据转换为可以使用R和Python等主流工具轻松读取和分析的格式。(4)记录和说明使用BONSAI调优各种不同类型内核的过程。这些模型案例研究将包括讨论如何应用BeastLang来创建参数化内核模板，如何调用并行基准测试引擎来生成和探索搜索空间，以及如何分析和可视化从引擎操作中收集的数据，以获得可以纠正或改进另一个迭代过程的见解。BONSAI项目有潜力通过以下方式从根本上改变自动调优研究：1)使来自广泛计算学科的广泛开发人员能够访问自动调优，而不是少数被选中的具有在串行执行范围内建立成功实验的魔法的个人。2)改变对自动调优的普遍看法，即自动调优不仅是生成快速代码的手段，而且是一种对复杂的软件和硬件交互进行性能分析和推理的一般技术，并将该技术定位为硬件-软件协同设计的主要工具之一。3)提高对探索被忽视的计算途径的兴趣，例如探索非正统的数据布局，并挑战遗留软件接口的现状。BONSAI有潜力将自动调谐带到软件开发的前沿，并帮助定位自动调谐作为软件工程的支柱。