权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: SHF: MEDIUM: Smart Integrated Tuning of Parallel Code for Multicore and Manycore Systems

合作研究：SHF：MEDIUM：多核和众核系统并行代码的智能集成调整

基本信息

批准号：
2211982
负责人：
Ali Jannesari
金额：
$ 50.23万
依托单位：
Iowa State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2211982&HistoricalAwards=false
关键词：
Collaborative Research SHF MEDIUM Smart

项目摘要

High Performance Computing (HPC) entails executing code on multicore and manycore architectures. To better utilize multicore/manycore architectures, parallel programming models have emerged. But often using these parallel models naively will not be able to scratch the surface of the potential performance gains such systems can provide. A common technique for improving performance is to add more hardware resources. However, this is expensive and system integration is usually an onerous task. To this end, the investigators propose a framework of improving performance by better utilization of the available resource and identifying near-optimal configuration. These configurations can take the form of code optimizations, as well as intelligent resource mapping and utilization. Specifically, this project is concerned with identifying code optimizations and runtime configurations that can potentially speed up executions manifold. Faster executions can also implicitly lead to reduced power consumption. Additionally, for situations where existing execution performance is acceptable, the proposed approach can also be extended to optimize for other performance metrics such as power. Power consumption is usually a huge bottleneck for HPC systems, and is a source of concern for organizations that deploy such systems; these concerns are both fiscal and environmental. The investigators posit that the framework outlined in this project can also be extended to optimize for power consumption without compromising execution performance.The investigators’ aim is to provide such an AI-assisted framework that can automatically configure parallel code considering the underlying hardware architecture. The steps necessary to build such a framework lie at the convergence of compiler technologies, performance analysis and modeling, and deep learning. A primary driver of this project will be developing a program representation technique targeted towards parallel code. Existing representations target mostly serial code and cannot fully encapsulate the interactions and complexities of parallel code. Such a code representation technique is highly suited to analyses using deep learning. A means of representing parallel code in a machine learning friendly format will be very beneficial to the overall program analysis community. The proposed code representation will take the form of a graph, in order to correctly typify the inherent structure present in code. The investigators propose modeling this code representation using state-of-the-art Graph Neural Network (GNN) techniques. The modeled embeddings will be used in conjunction with task specific features in order to identify near optimum configurations for improved performance. The overall scale of this project will span the entire “source code to execution” pipeline that most HPC workloads follow. The aim of this project is to optimize each optimizable step in the pipeline. A sample optimization pipeline can take the following form: given a parallel code, our GNN-based code optimization model will predict the best optimizations for the given code, followed by identifying the best device (CPU, GPU, and others) for executing the optimized code. Further downstream, our framework will identify the optimum runtime configurations appropriate for the device under consideration. The ideas presented in this project can have the potential effect of increased hardware utilization and reduced future hardware commissioning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

高性能计算(HPC)需要在多核和多核架构上执行代码。为了更好地利用多核/多核体系结构，并行编程模型应运而生。但经常幼稚地使用这些并行模型将无法触及此类系统所能提供的潜在性能收益的皮毛。提高性能的一种常见技术是添加更多硬件资源。然而，这是昂贵的，系统集成通常是一项繁重的任务。为此，调查人员提出了一个框架，通过更好地利用可用资源和确定近乎最佳的配置来提高性能。这些配置可以采取代码优化以及智能资源映射和利用的形式。具体地说，这个项目关注的是识别代码优化和运行时配置，这些优化和运行时配置可能会加速多种执行。更快的执行速度还可以间接降低功耗。此外，对于现有执行性能可以接受的情况，所提出的方法还可以扩展以针对其他性能度量进行优化，例如功率。功耗通常是HPC系统的一个巨大瓶颈，也是部署此类系统的组织的担忧来源；这些担忧既涉及财务问题，也涉及环境问题。研究人员假设，该项目中概述的框架也可以扩展以在不影响执行性能的情况下优化功耗。研究人员的目标是提供这样一个人工智能辅助框架，可以根据底层硬件架构自动配置并行代码。构建这样一个框架的必要步骤在于汇聚编译器技术、性能分析和建模以及深度学习。这个项目的一个主要驱动力将是开发一种针对并行代码的程序表示技术。现有的表示法主要针对的是串行代码，不能完全封装并行代码的交互和复杂性。这种代码表示技术非常适合使用深度学习进行分析。以机器学习友好的格式表示并行代码的方法将对整个程序分析社区非常有利。建议的代码表示将采用图形的形式，以便正确地表示代码中存在的固有结构。研究人员建议使用最先进的图形神经网络(GNN)技术对这种代码表示进行建模。模型化的嵌入将与特定于任务的功能结合使用，以便识别近乎最佳的配置以提高性能。该项目的总体规模将跨越大多数HPC工作负载遵循的整个“从源代码到执行”的管道。该项目的目的是优化流水线中的每一个可优化步骤。示例优化管道可以采用以下形式：给定并行代码，我们基于GNN的代码优化模型将预测给定代码的最佳优化，然后确定用于执行优化代码的最佳设备(CPU、GPU等)。再往下看，我们的框架将确定适合所考虑的设备的最佳运行时配置。本项目中提出的想法可能具有增加硬件利用率和减少未来硬件调试的潜在影响。该奖项反映了NSF的法定使命，并已通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（2）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Power Constrained Autotuning using Graph Neural Networks

使用图神经网络进行功率约束自动调整

DOI：
10.1109/ipdps54959.2023.00060
发表时间：
2023
期刊：
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS
影响因子：
0
作者：
Dutta, Akash;Choi, Jee;Jannesari, Ali
通讯作者：
Jannesari, Ali

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

DOI：
10.1145/3588195.3592984
发表时间：
2023-04
期刊：
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
影响因子：
0
作者：
Akashnil Dutta;J. Alcaraz;Ali TehraniJamsaz;Eduardo César;A. Sikora;A. Jannesari
通讯作者：
Akashnil Dutta;J. Alcaraz;Ali TehraniJamsaz;Eduardo César;A. Sikora;A. Jannesari