CC* Compute: A Cost-Effective, 2,048 Core InfiniBand Cluster at UTC for Campus Research and Education
CC* 计算:UTC 的具有成本效益的 2,048 核心 InfiniBand 集群,用于校园研究和教育
基本信息
- 批准号:1925603
- 负责人:
- 金额:$ 39.22万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-07-01 至 2022-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A team of researchers at the University of Tennessee at Chattanooga (UTC) will make a significant upgrade to the campus cyberinfrastructure that will provide state-of-the-art, cost-effective high-performance computing not previously possible. This project will significantly improve university researchers' and students' ability to perform, enhance, and expand their current computationally-intensive research, prototyping, and development activities and will complement other investments already made, in-progress, or on the plan-of-record of UTC, including access to commercial cloud computing services. In addition to computer science and engineering, the UTC team anticipates significant research projects in mathematics, hydrology and computational fluid dynamics which will engage four regional partner universities. Two teaching projects address HPC education and use of HPC for mechanical engineering undergraduate research/design. In addition to these funded projects, merited additional research projects are enabled over time as the PIs, Central IT, and the cluster's Advisory Board attract and onboard additional researchers and students requiring HPC. Among other users are the more than 20 computational science Ph.D. students, plus several postdocs. Furthermore, SimCenter---UTC's research computing hub---supports undergraduate research through self-funding and REU in HPC, providing additional users for the proposed cluster. This award allows the University of Tennessee at Chattanooga (UTC) to procure an innovative, 2,048-compute core, 16-server AMD EPYC2 cluster networked with 100Gbit/s InfiniBand plus 8TB of main memory and 77 Tflop/s of double-precision floating point arithmetic. EPYC2 Rome 7nm processors will be newly available at or near the start of the period of performance, so this project includes state-of-the-art, cost-effective, high-performance computing not previously possible using Intel or AMD processors. The university has invested in a "commodity" cluster as recently as three years ago, and it is heavily utilized by the existing user base. This system will be nearly four years old by the beginning of this proposed grant. By way of complement, upgrades to storage (1.1PB), internal networking, data center infrastructure, and private cloud virtualization (coming online by mid-2019) have prepared UTC to support a new campus-wide cluster with a growing number of users in addition to those named here. The proposed new campus cluster will enable core scales and total cluster memory not previously available on campus and thus help researchers prepare their scalable problem scenarios for greater scales on national resources such as XSEDE. Projects enabled immediately are 14 science driver projects (12 research, two teaching). Seven projects involve four regional partner universities. At least ten NSF grants at UTK, UTC, UAB, Tennessee Tech, and Ole Miss are enhanced. Project areas highlighted include fault-tolerant parallel computing, performance monitoring of HPC, next-generation parallel programming with MPI, special-purpose linear algebra, hydrology, and computational fluid dynamics research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
田纳西大学查塔努加分校(UTC)的一组研究人员将对校园网络基础设施进行重大升级,提供前所未有的最先进、经济高效的高性能计算。该项目将显著提高大学研究人员和学生执行、增强和扩展其当前计算密集型研究、原型和开发活动的能力,并将补充已经进行的、正在进行的或对UTC记录计划的其他投资,包括访问商业云计算服务。除了计算机科学和工程,UTC团队预计将在数学、水文学和计算流体力学方面进行重大研究项目,这些项目将与四所地区性合作大学合作。两个教学项目涉及高性能计算教育和高性能计算在机械工程本科研究/设计中的应用。除了这些受资助的项目外,随着PI、中央信息技术和集群的顾问委员会吸引和加入更多需要高绩效课程的研究人员和学生,随着时间的推移,有价值的其他研究项目也会得到支持。其他用户中有20多名计算科学博士生,还有几名博士后。此外,模拟中心-UTC的研究计算中心-通过自筹资金和HPC中的REU支持本科生研究,为拟议的集群提供额外的用户。该奖项允许田纳西大学查塔努加分校(UTC)采购一个创新的、拥有2,048个计算核心、16个服务器的AMD EPYC2集群,该集群与100Gbit/S InfiniBand、8TB主内存和77Tflp/S双精度浮点运算联网。EPYC2罗马7纳米处理器将在性能周期开始时或接近开始时新上市,因此该项目包括最先进的、经济高效的高性能计算,以前使用英特尔或AMD处理器是无法实现的。就在三年前,该大学还投资了一个“商品”集群,它被现有的用户群大量使用。到这项拟议拨款开始时,这个系统将有近四年的历史。作为补充,对存储(1.1 PB)、内部网络、数据中心基础设施和私有云虚拟化的升级(将于2019年年中上线)使UTC做好了准备,以支持除此处列出的用户之外还拥有越来越多用户的新园区范围群集。拟议的新校园集群将实现校园中以前没有的核心规模和总集群内存,从而帮助研究人员在XSEDE等国家资源上为更大规模的问题场景做好准备。立即启动的项目有14个科学驱动项目(12个研究项目,2个教学项目)。七个项目涉及四所区域伙伴大学。在UTK、UTC、UAB、田纳西理工大学和密歇根大学的至少十项NSF拨款得到了加强。突出的项目领域包括容错并行计算、高性能计算机的性能监控、使用MPI的下一代并行编程、特殊用途线性代数、水文学和计算流体动力学研究。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Implementation and evaluation of MPI 4.0 partitioned communication libraries
MPI 4.0分区通信库的实现和评估
- DOI:10.1016/j.parco.2021.102827
- 发表时间:2021
- 期刊:
- 影响因子:1.4
- 作者:Dosanjh, Matthew G.F.;Worley, Andrew;Schafer, Derek;Soundararajan, Prema;Ghafoor, Sheikh;Skjellum, Anthony;Bangalore, Purushotham V.;Grant, Ryan E.
- 通讯作者:Grant, Ryan E.
Design of a Portable Implementation of Partitioned Point-to-Point Communication Primitives
分区点对点通信原语的便携式实现的设计
- DOI:10.1145/3458744.3474046
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Worley, Andrew;Prema Soundararajan, Prema;Schafer, Derek;Bangalore, Purushotham;Grant, Ryan;Dosanjh, Matthew;Skjellum, Anthony;Ghafoor, Sheikh
- 通讯作者:Ghafoor, Sheikh
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Anthony Skjellum其他文献
Understanding GPU Triggering APIs for MPI+X Communication
了解用于 MPI X 通信的 GPU 触发 API
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Patrick G. Bridges;Anthony Skjellum;E. Suggs;Derek Schafer;P. Bangalore - 通讯作者:
P. Bangalore
MitM attacks on intellectual property and integrity of additive manufacturing systems: A security analysis
针对增材制造系统的知识产权和完整性的中间人攻击:安全分析
- DOI:
10.1016/j.cose.2024.103810 - 发表时间:
2024-05-01 - 期刊:
- 影响因子:5.400
- 作者:
Hamza Alkofahi;Heba Alawneh;Anthony Skjellum - 通讯作者:
Anthony Skjellum
Anthony Skjellum的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Anthony Skjellum', 18)}}的其他基金
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
2412182 - 财政年份:2023
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2405142 - 财政年份:2023
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Beginnings: Creating and Sustaining a Diverse Community of Expertise in Quantum Information Science (EQUIS) Across the Southeastern United States
起点:在美国东南部创建并维持一个多元化的量子信息科学 (EQUIS) 专业社区
- 批准号:
2414461 - 财政年份:2023
- 资助金额:
$ 39.22万 - 项目类别:
Cooperative Agreement
Collaborative Research: EAGER: Real-time Strategies and Synchronized Time Distribution Mechanisms for Enhanced Exascale Performance-Portability and Predictability
合作研究:EAGER:实时策略和同步时间分配机制,以增强百亿亿次性能-可移植性和可预测性
- 批准号:
2151020 - 财政年份:2022
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
CC* Networking Infrastructure: Advancing High-speed Networking at UTC for Research and Education
CC* 网络基础设施:推进 UTC 的研究和教育高速网络
- 批准号:
1925598 - 财政年份:2019
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
SPX: Collaborative Research: Intelligent Communication Fabrics to Facilitate Extreme Scale Computing
SPX:协作研究:促进超大规模计算的智能通信结构
- 批准号:
1918987 - 财政年份:2019
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Collaborative Research: Software Engineering Workforce Development in High Performance Computing for Digital Twins
协作研究:数字孪生高性能计算中的软件工程劳动力开发
- 批准号:
1935628 - 财政年份:2019
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Collaborative Research: CICI: Regional: SouthEast SciEntific Cybersecurity for University Research (SouthEast SECURE)
合作研究:CICI:区域:东南大学研究科学网络安全 (SouthEast SECURE)
- 批准号:
1812404 - 财政年份:2017
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
SHF: Medium: Collaborative Research: Next-Generation Message Passing for Parallel Programming: Resiliency, Time-to-Solution, Performance-Portability, Scalability, and QoS
SHF:中:协作研究:并行编程的下一代消息传递:弹性、解决时间、性能可移植性、可扩展性和 QoS
- 批准号:
1822191 - 财政年份:2017
- 资助金额:
$ 39.22万 - 项目类别:
Continuing Grant
CICI: Data Provenance: Collaborative Research: Provenance Assurance Using Currency Primitives
CICI:数据来源:协作研究:使用货币基元的来源保证
- 批准号:
1821926 - 财政年份:2017
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
相似海外基金
CC* Campus Compute: UTEP Cyberinfrastructure for Scientific and Machine Learning Applications
CC* 校园计算:用于科学和机器学习应用的 UTEP 网络基础设施
- 批准号:
2346717 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
SHF: Small: Redesigning the Memory System in the Era of Compute Express Link
SHF:小型:重新设计 Compute Express Link 时代的内存系统
- 批准号:
2333049 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
CC* Campus Compute: Building a Computational Cluster for Scientific Discovery
CC* 校园计算:构建科学发现计算集群
- 批准号:
2346673 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
CC* Campus Compute: Interdisciplinary GPU-Enabled Compute
CC* 校园计算:支持 GPU 的跨学科计算
- 批准号:
2346343 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
MYRTUS: Multi-layer 360° dYnamic orchestrion and interopeRable design environmenT for compute-continUum Systems
MYRTUS:用于连续计算系统的多层 360° 动态编排和可互操作设计环境
- 批准号:
10087666 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
EU-Funded
CAREER: Reinventing Computer Vision through Bio-inspired Retinomorphic Vision Sensors, Corticomorphic Compute-In-Memory Processors and Event-based Algorithms
职业:通过仿生视网膜形态视觉传感器、皮质形态内存计算处理器和基于事件的算法重塑计算机视觉
- 批准号:
2338171 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Continuing Grant
Equipment: CC* Campus Compute: A High-Performance Computing System for Research and Education in Arkansas
设备:CC* 校园计算:用于阿肯色州研究和教育的高性能计算系统
- 批准号:
2346752 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Research Infrastructure: CC* Campus Compute: Lawrence 2.0: Advancing Multi-Disciplinary Research and Education in South Dakota
研究基础设施:CC* 校园计算:Lawrence 2.0:推进南达科他州的多学科研究和教育
- 批准号:
2346643 - 财政年份:2024
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Collaborative Research: FET: Medium:Compact and Energy-Efficient Compute-in-Memory Accelerator for Deep Learning Leveraging Ferroelectric Vertical NAND Memory
合作研究:FET:中型:紧凑且节能的内存计算加速器,用于利用铁电垂直 NAND 内存进行深度学习
- 批准号:
2312886 - 财政年份:2023
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant
Collaborative Research: FET: Medium:Compact and Energy-Efficient Compute-in-Memory Accelerator for Deep Learning Leveraging Ferroelectric Vertical NAND Memory
合作研究:FET:中型:紧凑且节能的内存计算加速器,用于利用铁电垂直 NAND 内存进行深度学习
- 批准号:
2312884 - 财政年份:2023
- 资助金额:
$ 39.22万 - 项目类别:
Standard Grant