SHF: Small: Locality Aware Scheduling in Multi-GPU Systems
SHF:小型:多 GPU 系统中的局部感知调度
基本信息
- 批准号:1907401
- 负责人:
- 金额:$ 43.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Heterogeneous multiprocessor architectures consisting of Central Processing Units (CPUs) and Graphical Processing Units (GPUs) are increasingly used to accelerate parallel workloads like High Performance Computing (HPC) and cloud computing. GPUs provide significant improvements in performance compared to traditional multi-core CPUs, and therefore, are heavily used as accelerators. Multiple GPUs are employed to further speed up the execution and improve storage capacity. Current multi-GPU architectures, such as DGX, provide ultra-high bandwidth NVLink communication to transfer the data directly between the GPUs. However, partitioning those computations and data in multi-GPUs based on various memory and communication models poses a tremendous challenge to the programmers. This project develops graph-based partitioning techniques for different applications considering data locality among the computations in the GPUs. Secondly, the current literature on heterogeneous scheduling does not consider processing inside the GPU, leaving it to the manufacturer. This project also develops a locality-based Thread Block (TB) scheduler by extending the same graph-based technique to cache block sharing.The project is carried out in several steps. First, it develops micro-benchmarks for measuring the computation and communication cost for execution in a multi-GPU architecture. A profiling tool is developed to measure the data sharing among the TBs for GPU execution. Second, an adjacency graph is designed for the multi-GPU data partition, where the vertices represent the computation, and edges represent the communication cost between the vertices. A similar graph model is also developed for data sharing inside a GPU, where vertices represent the TBs and edges represent the number of shared blocks between the TBs. Third, a recursive bi-partitioning technique is developed for the adjacency graph using known heuristics and software  to achieve load balance among the partitions and minimize the communication cost between the partitions in a multi-GPU system. TB scheduling is also proposed considering the L2 cache size and the resource limit inside a GPU. Fourth, the technique is extended to partition data and computations between CPUs and GPUs in a heterogeneous multiprocessor. Finally, two regular applications, LU decomposition and Wavefront, are analyzed, and multi-GPU scheduling is developed through real implementation using GPU architectures. Some irregular applications from the Rodinia and CUDA-SDK benchmarks are also analyzed to develop graph models and execute them on the GPGPUSim for verification of the TB scheduling inside the GPU.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
由中央处理单元(cpu)和图形处理单元(gpu)组成的异构多处理器架构越来越多地用于加速并行工作负载,如高性能计算(HPC)和云计算。与传统的多核cpu相比,gpu提供了显著的性能改进,因此被大量用作加速器。使用多个gpu进一步提高执行速度和存储容量。目前的多gpu架构,如DGX,提供超高带宽NVLink通信,直接在gpu之间传输数据。然而,在基于不同内存和通信模型的多个gpu中划分这些计算和数据给程序员带来了巨大的挑战。本项目针对不同应用开发基于图的分区技术,考虑到gpu计算中的数据局部性。其次,目前关于异构调度的文献没有考虑GPU内部的处理,把它留给了制造商。该项目还通过扩展相同的基于图的技术来缓存块共享,开发了基于位置的线程块(TB)调度程序。这个项目分几个步骤进行。首先,它开发了用于测量在多gpu架构中执行的计算和通信成本的微基准。开发了一种分析工具,用于测量GPU执行时tb之间的数据共享。其次,针对多gpu数据分区设计邻接图,其中顶点表示计算量,边表示顶点之间的通信代价;一个类似的图形模型也被开发用于GPU内部的数据共享,其中顶点表示tb,边表示tb之间共享块的数量。第三,利用已知的启发式算法和软件开发了邻接图的递归双分区技术,以实现多gpu系统中分区之间的负载平衡,并使分区之间的通信开销最小化。考虑到二级缓存大小和GPU内部的资源限制,还提出了TB调度。第四,将该技术扩展到异构多处理器中cpu和gpu之间的数据和计算分区。最后,分析了两个常用的应用,即LU分解和Wavefront,并通过GPU架构的实际实现开发了多GPU调度。还分析了来自Rodinia和CUDA-SDK基准测试的一些不规则应用程序,以开发图形模型并在GPGPUSim上执行,以验证GPU内部的TB调度。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
GreenMD: Energy-efficient Matrix Decomposition on Heterogeneous Multi-GPU Systems
GreenMD:异构多 GPU 系统上的节能矩阵分解
- DOI:10.1145/3583590
- 发表时间:2023
- 期刊:
- 影响因子:1.6
- 作者:Zamani, Hadi;Bhuyan, Laxmi;Chen, Jieyang;Chen, Zizhong
- 通讯作者:Chen, Zizhong
Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
- DOI:10.1145/3572848.3577496
- 发表时间:2023-01
- 期刊:
- 影响因子:0
- 作者:Jieyang Chen;Xin Liang;Kai Zhao;H. Sabzi;L. Bhuyan;Zizhong Chen
- 通讯作者:Jieyang Chen;Xin Liang;Kai Zhao;H. Sabzi;L. Bhuyan;Zizhong Chen
{{
                item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ patent.updateTime }}
Laxmi Bhuyan其他文献
Assertion Based Verification and Analysis of Network Processor Architectures
- DOI:10.1007/s10617-005-1193-5 
- 发表时间:2005-07-11 
- 期刊:
- 影响因子:0.900
- 作者:Xi Chen;Yan Luo;Harry Hsieh;Laxmi Bhuyan;Felice Balarin 
- 通讯作者:Felice Balarin 
Laxmi Bhuyan的其他文献
{{
              item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
{{ truncateString('Laxmi Bhuyan', 18)}}的其他基金
Travel: Student Travel Support to NAS 2021
旅行:2021 年 NAS 学生旅行支持
- 批准号:2139217 
- 财政年份:2021
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: Medium: Energy Efficient Computing on GPU-based Heterogeneous Systems
SHF:中:基于 GPU 的异构系统的节能计算
- 批准号:1513201 
- 财政年份:2015
- 资助金额:$ 43.16万 
- 项目类别:Continuing Grant 
SHF: Small: Efficient CPU-GPU Communication for Heterogeneous Architectures
SHF:小型:异构架构的高效 CPU-GPU 通信
- 批准号:1423108 
- 财政年份:2014
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
EAGER: Developing a Programming Environment for Heterogenous Multiprocessors
EAGER:为异构多处理器开发编程环境
- 批准号:1157377 
- 财政年份:2012
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
CSR: Small: Power-Efficient Multicore Scheduling for Network Applications
CSR:小型:网络应用的高能效多核调度
- 批准号:1216014 
- 财政年份:2012
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: Medium: Hardware/Software Partitioning for Hybrid Shared Memory Multiprocessors
SHF:中:混合共享内存多处理器的硬件/软件分区
- 批准号:0905509 
- 财政年份:2009
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
CSR: Small: Core Scheduling to Improve Virtualized I/O Performance on Multi-Core Systems
CSR:小型:通过核心调度提高多核系统上的虚拟化 I/O 性能
- 批准号:0912850 
- 财政年份:2009
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
CPA-CSA: Virtualization-Aware Architectures to Accelerate Network I/O Processing
CPA-CSA:加速网络 I/O 处理的虚拟化感知架构
- 批准号:0811834 
- 财政年份:2008
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
NEDG: Application Oriented Edge Routers
NEDG:面向应用的边缘路由器
- 批准号:0832108 
- 财政年份:2008
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
MRI: Acquisition of an Ultra Low-Latency Multiprocessor System with On-Board Hardware Accelerators
MRI:获取具有板载硬件加速器的超低延迟多处理器系统
- 批准号:0619223 
- 财政年份:2006
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Collaborative Research: SHF: Small: Reimagining Communication Bottlenecks in GNN Acceleration through Collaborative Locality Enhancement and Compression Co-Design
协作研究:SHF:小型:通过协作局部性增强和压缩协同设计重新想象 GNN 加速中的通信瓶颈
- 批准号:2326494 
- 财政年份:2023
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
Collaborative Research: SHF: Small: Reimagining Communication Bottlenecks in GNN Acceleration through Collaborative Locality Enhancement and Compression Co-Design
协作研究:SHF:小型:通过协作局部性增强和压缩协同设计重新想象 GNN 加速中的通信瓶颈
- 批准号:2326495 
- 财政年份:2023
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
CIF: Small: Load Balancing for Cloud Networks: Data Locality Issues and Modern Algorithms
CIF:小型:云网络的负载平衡:数据局部性问题和现代算法
- 批准号:2113027 
- 财政年份:2021
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
OAC: Small: Data Locality Optimization for Sparse Matrix/Tensor Computations
OAC:小型:稀疏矩阵/张量计算的数据局部性优化
- 批准号:2009007 
- 财政年份:2020
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
AF: Small: Toward A Unified Model of Parallelism And Locality
AF:小:走向并行性和局部性的统一模型
- 批准号:1911245 
- 财政年份:2019
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
AF: Small: Locality and Energy in Distributed Computing
AF:小:分布式计算中的局部性和能量
- 批准号:1815316 
- 财政年份:2018
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: Small: The Loop Chain Abstraction for Balancing Locality and Parallelism
SHF:小:平衡局部性和并行性的循环链抽象
- 批准号:1700723 
- 财政年份:2016
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: Small: Locality-Aware Concurrency Platforms
SHF:小型:位置感知并发平台
- 批准号:1527692 
- 财政年份:2015
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: Small: The Loop Chain Abstraction for Balancing Locality and Parallelism
SHF:小:平衡局部性和并行性的循环链抽象
- 批准号:1422725 
- 财政年份:2014
- 资助金额:$ 43.16万 
- 项目类别:Standard Grant 
SHF: AF: Small: Locality with Dynamic Parallelism
SHF:AF:小:具有动态并行性的局部性
- 批准号:1018188 
- 财政年份:2010
- 资助金额:$ 43.16万 
- 项目类别:Continuing Grant 

 刷新
              刷新
            
















 {{item.name}}会员
              {{item.name}}会员
            



