Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
基本信息
- 批准号:2403399
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-07-01 至 2027-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Supercomputers, or high-performance computing (HPC) clusters, are instrumental in propelling scientific and engineering research by offering vast computational resources. These systems are increasingly crucial as artificial intelligence (AI) techniques become pervasive across various fields, including climate modeling, drug discovery, and physics simulations, significantly expanding the need for computational power and data management. However, the existing HPC infrastructures face challenges with extended job wait times and suboptimal resource use, primarily due to the escalating complexity of computations and the burgeoning demands for resources. Unlike traditional HPC tasks, AI algorithms and models exhibit distinct resource requirements, often resulting in either excess or insufficient resource allocation for AI tasks. This project aims to bridge the gap between HPC resource provisioning and AI application demands through an in-depth analysis of resource allocation and utilization within HPC environments running AI workloads. The goal is to identify strategies for minimizing resource waste and reducing the length of job queues by efficiently reallocating idle resources to accommodate large-scale AI tasks. By creating and disseminating datasets, models, algorithms, and system source code, this initiative will contribute valuable tools and insights to the research community. The findings will be broadly shared through research papers, technical reports, book chapters, course materials, and tutorials, enhancing the knowledge base in both HPC and AI fields and supporting the broader objectives of promoting scientific progress, improving national health, prosperity, and welfare, and contributing to national defense. This project centers on advancing the efficiency and productivity of HPC systems by innovatively leveraging idle resources to expedite AI job processing and diminish waiting periods. The research is structured around three interconnected themes, each addressing critical aspects of resource utilization and AI performance enhancement within HPC environments. The initial theme undertakes a comprehensive analysis of idle resources in HPC systems, aiming to identify patterns and opportunities for resource optimization. Building on the insights gained, the second theme explores methodologies for the safe and timely harvesting of idle resources across various categories, ensuring that these resources can be reallocated without compromising system stability or performance. The third theme is dedicated to developing strategies that utilize these harvested resources to boost AI application outcomes significantly and, by extension, enhance the overall productivity of HPC operations. The project will implement a tangible HPC testbed equipped with real-world benchmarks and workloads alongside these thematic investigations. This testbed will serve as a platform for empirically validating developed algorithms and systems, facilitating a rigorous assessment of their effectiveness in improving HPC resource allocation and utilization.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
超级计算机或高性能计算(HPC)集群,通过提供大量的计算资源来推动科学和工程研究。随着人工智能(AI)技术在各个领域的普及,包括气候建模,药物发现和物理模拟,大大扩大了计算能力和数据管理的需求,这些系统越来越至关重要。但是,现有的HPC基础架构通过延长的工作等待时间和次优的资源使用面临挑战,这主要是由于计算的复杂性和对资源的新兴需求的升级。与传统的HPC任务不同,AI算法和模型表现出不同的资源要求,通常导致AI任务的过量或不足资源分配。该项目旨在通过对运行AI工作负载的HPC环境中资源分配和利用的深入分析来弥合HPC资源提供和AI应用程序之间的差距。目的是确定最大程度地减少资源浪费的策略,并通过有效地重新分配空闲资源以适应大规模的AI任务,从而减少工作队列的长度。通过创建和传播数据集,模型,算法和系统源代码,该计划将为研究社区提供宝贵的工具和见解。这些发现将通过研究论文,技术报告,书籍章节,课程材料和教程大致分享,增强了HPC和AI领域的知识基础,并支持促进科学进步,改善国家健康,繁荣和福利的更广泛的目标,并为国防而做出贡献。该项目以创新的利用闲置资源来加快AI工作处理并减少等待时间来提高HPC系统的效率和生产率。该研究围绕三个相互联系的主题进行了结构,每个主题都涉及资源利用的关键方面和HPC环境中AI性能的提高。最初的主题对HPC系统中的空闲资源进行了全面分析,旨在确定资源优化的模式和机会。第二个主题以洞察力为基础,探讨了在各个类别中安全和及时收集闲置资源的方法,以确保可以将这些资源重新分配而不会损害系统稳定性或性能。第三个主题致力于制定利用这些收获的资源来大大提高AI应用结果的策略,并扩展提高HPC运营的整体生产率。该项目将实施一个有形的HPC测试床,配备了现实世界的基准和工作量以及这些主题研究。该测试床将作为实证验证算法和系统的平台,促进对其在改善HPC资源分配和利用方面的有效性进行严格的评估。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的智力和更广泛影响的评估来通过评估来支持的,这是值得的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Seung-Jong Park其他文献
Quality changes in <em>Pteridium aquilinum</em> and the root of <em>Platycodon grandiflorum</em> frozen under different conditions
- DOI:
10.1016/j.ijrefrig.2014.04.004 - 发表时间:
2014-07-01 - 期刊:
- 影响因子:
- 作者:
Seung-Jong Park;Mohammad Al Mijan;Kyung Bin Song - 通讯作者:
Kyung Bin Song
A Hadoop approach to advanced sampling algorithms in molecular dynamics simulation on cloud computing
- DOI:
10.1109/bibm.2013.6732534 - 发表时间:
2013-01-01 - 期刊:
- 影响因子:0
- 作者:
Jin Niu;Bai, Shuju;Seung-Jong Park - 通讯作者:
Seung-Jong Park
Energy-Aware Topology Control and Data Delivery in Wireless Sensor Networks
- DOI:
- 发表时间:
2004-07 - 期刊:
- 影响因子:0
- 作者:
Seung-Jong Park - 通讯作者:
Seung-Jong Park
Seung-Jong Park的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Seung-Jong Park', 18)}}的其他基金
IPA Agreement with Louisiana State University 1st year (Park 2021)
与路易斯安那州立大学签订 IPA 协议第一年(2021 年公园)
- 批准号:
2120248 - 财政年份:2021
- 资助金额:
$ 30万 - 项目类别:
Intergovernmental Personnel Award
SCC-Planning: Promoting Smart Technologies in Public Safety and Transportation to Improve Social and Economic Outcomes in a US EDA-Designated Critical Manufacturing Region
SCC-规划:在公共安全和交通领域推广智能技术,以改善美国 EDA 指定关键制造区域的社会和经济成果
- 批准号:
1737557 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
MRI: Acquisition of SuperMIC -- A Heterogeneous Computing Environment to Enable Transformation of Computational Research and Education in the State of Louisiana
MRI:收购 SuperMIC——一种异构计算环境,以实现路易斯安那州计算研究和教育的转型
- 批准号:
1338051 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CC-NIE Integration: Bridging, Transferring and Analyzing Big Data over 10Gbps Campus-Wide Software Defined Networks
CC-NIE 集成:通过 10Gbps 校园范围软件定义网络桥接、传输和分析大数据
- 批准号:
1341008 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
MRI: CRON: Development of a Cyberinfrastructure Reconfigurable Optical Network for Large-Scale Scientific Discovery
MRI:CRON:开发用于大规模科学发现的网络基础设施可重构光网络
- 批准号:
0821741 - 财政年份:2008
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
- 批准号:62371263
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
腙的Heck/脱氮气重排串联反应研究
- 批准号:22301211
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
- 批准号:52364038
- 批准年份:2023
- 资助金额:33 万元
- 项目类别:地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
- 批准号:82371176
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FOXO3 m6A甲基化修饰诱导滋养细胞衰老效应在补肾法治疗自然流产中的机制研究
- 批准号:82305286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant