MRI: Acquisition of a Heterogeneous Multi-GPU Cluster to Support Exploration at Scale

MRI:获取异构多 GPU 集群以支持大规模探索

基本信息

  • 批准号:
    1920020
  • 负责人:
  • 金额:
    $ 39.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2021-09-30
  • 项目状态:
    已结题

项目摘要

This project aims to acquire a heterogeneous Multi-GPU cluster, constructed out of state-of-the-art GPUs devices, interconnected with emerging NVLink and HDR networks, network-attached non-volatile memory (NVM) storage for GPU caching, and interconnected by a smart HDR infiniband switch, to enable, accelerate, explore, and support applications at scale from different domains that include:• Distributed deep neural networks for retinopathy,• Wireless network forensics,• Adversarial machine learning,• Computational social science,• Mathematical optimization and big data analytics,• Coastal engineering modeling, and• Multi-GPU system (including NVMe technology to support caching in GPU network and a smart network switch that can offload collective operations)These features will enable computational scientists to exploit GPU parallelism in new ways by programming the smart network switch and caching selectively to hide memory and interconnect latency.Currently, graphics processing units (GPUs) provide high computational throughput by lunching a large number of threads by overlapping compute and memory operations. Combined with low-overhead thread swapping, GPUs can hide long memory operations. But underlying system architectures have not kept up as the size and complexity of GPU applications grow. The multi-GPU solutions are less programmer friendly and result in lower scalability when their architectural support is compared with the multi-CPU systems. Current GPUs systems treat GPUs as discrete devices, with limited support for a truly shared memory programming model. Since multi-GPU interconnect bandwidth has become a limiting factor for scaling multi-GPU systems, exploration of new network topologies, smarter network elements, and enhanced software layers for caching and prefetching, that meet the needs of tomorrow’s demanding data applications are necessary.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目旨在获得一个异构的多GPU集群,由最先进的GPU设备构建,与新兴的NVLink和HDR网络互连,用于GPU缓存的网络附加非易失性存储器(NVM)存储,并通过智能HDR Infiniband交换机互连,以启用,加速,探索和支持来自不同领域的大规模应用程序,包括:·分布式视网膜病变深度神经网络,·无线网络取证,·对抗机器学习,·计算社会科学,·数学优化和大数据分析,·海岸工程建模,·多GPU系统(包括支持GPU网络缓存的NVMe技术和可以卸载集体操作的智能网络交换机)这些功能将使计算科学家能够通过编程智能网络交换机和选择性缓存来隐藏内存和互连延迟,以新的方式利用GPU并行性。目前,图形处理单元(GPU)通过重叠计算和存储器操作来启动大量线程,从而提供高计算吞吐量。结合低开销的线程交换,GPU可以隐藏长内存操作。但是,底层系统架构并没有跟上GPU应用程序的规模和复杂性的增长。与多CPU系统相比,多GPU解决方案对程序员不太友好,并且在其架构支持方面导致可扩展性较低。目前的GPU系统将GPU视为离散设备,对真正共享内存编程模型的支持有限。由于多GPU互连带宽已经成为扩展多GPU系统、探索新网络拓扑、更智能的网络元件以及用于缓存和预取的增强软件层的限制因素,该奖项反映了NSF的法定使命,并通过利用基金会的知识价值和更广泛的影响进行评估,被认为值得支持审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Kaeli其他文献

Intra-Cluster Coalescing and Distributed-Block Scheduling to Reduce GPU NoC Pressure
集群内合并和分布式块调度以减少 GPU NoC 压力
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Lu Wang;Xia Zhao;David Kaeli;Zhiying Wang;Lieven Eeckhout
  • 通讯作者:
    Lieven Eeckhout
OpenCL Case Study: Histogram
OpenCL 案例研究:直方图
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Benedict R. Gaster;Lee Howes;David Kaeli;Perhaad Mistry;Dana Schaa
  • 通讯作者:
    Dana Schaa
MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training
MaxK-GNN:加速图神经网络训练的理论速度极限
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hongwu Peng;Xi Xie;Kaustubh Shivdikar;Md Amit Hasan;Jiahui Zhao;Shaoyi Huang;Omer Khan;David Kaeli;Caiwen Ding
  • 通讯作者:
    Caiwen Ding
Scalability Limitations of Processing-in-Memory using Real System Evaluations
使用真实系统评估的内存处理的可扩展性限制
Addressing a workload characterization study to the design of consistency protocols
  • DOI:
    10.1007/s11227-006-7866-4
  • 发表时间:
    2006-10-01
  • 期刊:
  • 影响因子:
    2.700
  • 作者:
    Salvador Petit;Julio Sahuquillo;Ana Pont;David Kaeli
  • 通讯作者:
    David Kaeli

David Kaeli的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Kaeli', 18)}}的其他基金

Collaborative Research: CSR: Medium: Architecting GPUs for Practical Homomorphic Encryption-based Computing
协作研究:CSR:中:为实用的同态加密计算构建 GPU
  • 批准号:
    2312275
  • 财政年份:
    2023
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant
REU Site: REU Research Experiences and Mentoring in Data-Driven Discovery
REU 网站:REU 在数据驱动发现方面的研究经验和指导
  • 批准号:
    1559894
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Student Travel for PACT 2016
2016 年 PACT 学生旅行
  • 批准号:
    1624175
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
STARSS: Small: Side-Channel Analysis and Resiliency Targeting Accelerators
STARSS:小型:侧通道分析和弹性目标加速器
  • 批准号:
    1618379
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Northeastern University Planning Grant: I/UCRC for Energy-Smart Electronic Systems
东北大学规划补助金:I/UCRC 节能电子系统
  • 批准号:
    1624662
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
CSR: Small: Collaborative Research: Leveraging Intra-chip/Inter-chip Silicon-Photonic Networks for Designing Next-Generation Accelerators
CSR:小型:协作研究:利用芯片内/芯片间硅光子网络设计下一代加速器
  • 批准号:
    1525412
  • 财政年份:
    2015
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Support for the 37th International Symposium on Computer Architecture (ISCA 2010)
支持第 37 届计算机体系结构国际研讨会 (ISCA 2010)
  • 批准号:
    1041971
  • 财政年份:
    2010
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
SHF: Small: The Cross-layer Reliability Stack
SHF:小型:跨层可靠性堆栈
  • 批准号:
    1017439
  • 财政年份:
    2010
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
A Biomedical Imaging Acceleration Testbed
生物医学成像加速测试台
  • 批准号:
    0946463
  • 财政年份:
    2009
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
CRI: CRD Collaborative Research: Archer - Seeding a Community-based Computing Infrastructure for Computer Architecture Research and Education
CRI:CRD 协作研究:Archer - 为计算机体系结构研究和教育提供基于社区的计算基础设施
  • 批准号:
    0751091
  • 财政年份:
    2008
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant

相似海外基金

MRI: Acquisition of a Heterogeneous High-Performance Computing Cluster Driven by Computational and Data-Intensive Multidisciplinary Research
MRI:获取由计算和数据密集型多学科研究驱动的异构高性能计算集群
  • 批准号:
    2216311
  • 财政年份:
    2022
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Micro-Transfer Printer for Heterogeneous Integration of Electronic/Photonic Microsystems
MRI:购买用于电子/光子微系统异构集成的微型转移打印机
  • 批准号:
    2117812
  • 财政年份:
    2021
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Heterogeneous GPU Cluster to Facilitate Deep Learning Research at UMBC
MRI:收购异构 GPU 集群以促进 UMBC 的深度学习研究
  • 批准号:
    1920079
  • 财政年份:
    2019
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of Heterogeneous Computer System for Machine Learning
MRI:获取用于机器学习的异构计算机系统
  • 批准号:
    1919752
  • 财政年份:
    2019
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Heterogeneous Computing Platform for Biometrics Research
MRI:收购用于生物识别研究的异构计算平台
  • 批准号:
    1626360
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a Heterogeneous Networked Instrument for Aquatic Exploration and Intelligent Sampling
MRI:获取用于水生勘探和智能采样的异构网络仪器
  • 批准号:
    1531322
  • 财政年份:
    2015
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI Consortium: Acquisition of a Heterogeneous, Shared, Computing Instrument to Enable Science and Computing Research by the Mass. Green High Performance Computing Consortium
MRI 联盟:收购异构、共享计算仪器,以支持马萨诸塞州的科学和计算研究。绿色高性能计算联盟
  • 批准号:
    1538918
  • 财政年份:
    2014
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of SuperMIC -- A Heterogeneous Computing Environment to Enable Transformation of Computational Research and Education in the State of Louisiana
MRI:收购 SuperMIC——一种异构计算环境,以实现路易斯安那州计算研究和教育的转型
  • 批准号:
    1338051
  • 财政年份:
    2013
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI Consortium: Acquisition of a Heterogeneous, Shared, Computing Instrument to Enable Science and Computing Research by the Mass. Green High Performance Computing Consortium
MRI 联盟:收购异构、共享计算仪器,以支持马萨诸塞州的科学和计算研究。绿色高性能计算联盟
  • 批准号:
    1229059
  • 财政年份:
    2012
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MRI: Acquisition of a High-Performance Instrument for Heterogeneous and Biologically Inspired Architectures Research at CUA
MRI:CUA 采购用于异质和生物启发架构研究的高性能仪器
  • 批准号:
    1126120
  • 财政年份:
    2011
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了