EAGER: Resilient, Energy Efficient HPC System Configuration
EAGER:弹性、节能的 HPC 系统配置
基本信息
- 批准号:1349521
- 负责人:
- 金额:$ 29.88万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-10-01 至 2016-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
High-performance computing systems (supercomputers) continue to grow in size and complexity. Today's leading edge systems contain tens of thousands of server nodes, and proposed, next-generation systems are likely to contain hundreds of thousands of nodes. At this scale, maintaining system operation when hardware components may fail every few minutes or hours is increasingly difficult. Increasing system sizes bring a complementary challenge surrounding energy availability and costs, with projected systems expected to consume ten or more megawatts of power. For future high-performance computing systems to be useable and cost effective, we must develop new design methodologies and operating principles that embody the two important realities of large-scale systems: frequent hardware component failures are a part of normal operation and (b) energy consumption and power costs must be managed as carefully as performance and resilience. As part of this research, the principal investigator will apply new ideas from commercial cloud computing to HPC systems, focusing on reliability and energy efficiency. This includes models of high-performance computing system design based on right-sizing hardware building blocks to balance operating costs for component replacement and repair against capital costs for over-provisioning, and incorporation of energy costs and constraints into scheduling systems and resource allocations, making computing costs visible to researchers. The deployment of very large-scale computing systems, which target science, engineering and defense problems of critical national interest, is currently limited by both system reliability and energy consumption. New design and operating approaches for reliability and energy management can both reduce costs and increase access, allowing computer companies to design larger systems, research institutions to deploy systems more widely, and researchers to better manage computational resources.
高性能计算系统(超级计算机)的规模和复杂性持续增长。 当今的前沿系统包含数万个服务器节点,而拟议的下一代系统可能包含数十万个节点。在这种规模下,当硬件组件可能每隔几分钟或几小时就出现故障时,维护系统操作变得越来越困难。随着系统规模的不断扩大,围绕能源可用性和成本的挑战也随之而来,预计系统将消耗10兆瓦或更多的电力。对于未来的高性能计算系统是可用的和具有成本效益的,我们必须开发新的设计方法和操作原则,体现了两个重要的现实大规模系统:频繁的硬件组件故障是正常操作的一部分和(B)能源消耗和电力成本必须小心管理的性能和弹性。 作为这项研究的一部分,首席研究员将把商业云计算的新想法应用到HPC系统中,重点关注可靠性和能源效率。这包括高性能计算系统设计的模型,其基于适当大小的硬件构建块,以平衡组件更换和维修的运营成本与过度配置的资本成本,并将能源成本和约束纳入调度系统和资源分配,使计算成本对研究人员可见。部署超大规模的计算系统,其目标是关键的国家利益的科学,工程和国防问题,目前受到系统可靠性和能源消耗的限制。 可靠性和能源管理的新设计和操作方法既可以降低成本,又可以增加访问,使计算机公司能够设计更大的系统,研究机构能够更广泛地部署系统,研究人员能够更好地管理计算资源。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daniel Reed其他文献
The impact of radiation dose to the urethra on brachytherapy-related dysuria.
尿道辐射剂量对近距离放射治疗相关排尿困难的影响。
- DOI:
10.1016/j.brachy.2004.10.008 - 发表时间:
2005 - 期刊:
- 影响因子:1.9
- 作者:
G. Merrick;W. Butler;K. Wallner;Z. Allen;R. Galbreath;J. Lief;Daniel Reed - 通讯作者:
Daniel Reed
A phase diagram of Ba1‐xCaxTiO3 (x = 0‐0.30) piezoceramics by Raman spectroscopy
Ba1-xCaxTiO3 (x = 0-0.30) 压电陶瓷的拉曼光谱相图
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
C. Shu;Daniel Reed;T. Button - 通讯作者:
T. Button
Observation of TiH5 and TiH7 in Bulk-Phase TiH3 Gels for Kubas-Type Hydrogen Storage
用于 Kubas 型储氢的体相 TiH3 凝胶中 TiH5 和 TiH7 的观察
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
T. Hoang;L. Morris;Daniel Reed;D. Book;M. Trudeau;D. Antonelli - 通讯作者:
D. Antonelli
Impact of Social Determinants of Health and Geographic Location on Real-World Outcomes in AML Patients Undergoing Low-Intensity Chemotherapy
- DOI:
10.1182/blood-2024-201720 - 发表时间:
2024-11-05 - 期刊:
- 影响因子:
- 作者:
Valerie Tran;Gordon Smilnak;Nandan Srinivasa;Firas El Chaer;Michael Keng;Daniel Reed - 通讯作者:
Daniel Reed
Study of the NaBH4–NaBr system and the behaviour of its low temperature phase transition
NaBH4-NaBr体系及其低温相变行为研究
- DOI:
10.1016/j.ijhydene.2017.03.045 - 发表时间:
2017 - 期刊:
- 影响因子:7.2
- 作者:
Christos Paterakis;Sheng Guo;M. Heere;Yinzhe Liu;Luisana Contreras;M. Sørby;B. Hauback;Daniel Reed;D. Book - 通讯作者:
D. Book
Daniel Reed的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daniel Reed', 18)}}的其他基金
CYBER-INSIGHT: Evaluating Cyberinfrastructure Total Cost of Ownership
网络洞察:评估网络基础设施的总体拥有成本
- 批准号:
1938985 - 财政年份:2019
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
Workshop Proposal: Rethinking NSF's Computational Ecosystem for 21st Century Science and Engineering
研讨会提案:重新思考 NSF 21 世纪科学与工程的计算生态系统
- 批准号:
1836997 - 财政年份:2018
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
Doctoral Dissertation Research: An Ethnographic Study of Memory Entrepreneurship and Community
博士论文研究:记忆创业与社区的民族志研究
- 批准号:
1823896 - 财政年份:2018
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
CYBER-INSIGHT: Evaluating Cyberinfrastructure Total Cost of Ownership
网络洞察:评估网络基础设施的总体拥有成本
- 批准号:
1812786 - 财政年份:2018
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
RAPID: Tracing the origin and fate of particulate organic matter in nearshore marine sediments
RAPID:追踪近岸海洋沉积物中颗粒有机物的起源和命运
- 批准号:
1623590 - 财政年份:2016
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
LTER: Land/Ocean Interactions and the Dynamics of Kelp Forest Ecosystems (SBC III)
LTER:陆地/海洋相互作用和海带森林生态系统的动态(SBC III)
- 批准号:
1232779 - 财政年份:2012
- 资助金额:
$ 29.88万 - 项目类别:
Continuing Grant
Collaborative Research: The effect of inbreeding on metapopulation dynamics of the giant kelp, Macrocystis pyrifera
合作研究:近交对巨藻Macrocystis Pyrifera 集合种群动态的影响
- 批准号:
1233283 - 财政年份:2012
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
LTER: Land/Ocean Interactions and the Dynamics of Kelp Forest Communities
LTER:陆地/海洋相互作用和海带森林群落的动态
- 批准号:
0620276 - 财政年份:2006
- 资助金额:
$ 29.88万 - 项目类别:
Continuing Grant
ITR: Intelligent High-Performance Computing on Toys
ITR:玩具智能高性能计算
- 批准号:
0434286 - 财政年份:2004
- 资助金额:
$ 29.88万 - 项目类别:
Continuing Grant
ITR: Intelligent High-Performance Computing on Toys
ITR:玩具智能高性能计算
- 批准号:
0219597 - 财政年份:2002
- 资助金额:
$ 29.88万 - 项目类别:
Continuing Grant
相似海外基金
CAREER: Resilient and Efficient Automatic Control in Energy Infrastructure: An Expert-Guided Policy Optimization Framework
职业:能源基础设施中的弹性和高效自动控制:专家指导的政策优化框架
- 批准号:
2338559 - 财政年份:2024
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
NSF Engines Development Award: Developing an use-inspired decarbonization and grid resilient energy ecosystem (WV, PA)
NSF 发动机开发奖:开发以使用为灵感的脱碳和电网弹性能源生态系统(西弗吉尼亚州、宾夕法尼亚州)
- 批准号:
2315455 - 财政年份:2024
- 资助金额:
$ 29.88万 - 项目类别:
Cooperative Agreement
TROCI: Towards Resilient Operation of Critical Infrastructures - application to water and energy systems
TROCI:实现关键基础设施的弹性运行 - 在水和能源系统中的应用
- 批准号:
EP/Y036344/1 - 财政年份:2024
- 资助金额:
$ 29.88万 - 项目类别:
Research Grant
Global Centers Track 2: Equitable and User-Centric Energy Market for Resilient Grid-interactive Communities
全球中心轨道 2:面向弹性电网互动社区的公平且以用户为中心的能源市场
- 批准号:
2330504 - 财政年份:2024
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
PFI-RP: Resilient and Energy-Efficient Memory Chips for Enhanced Mobile AI and Personalized Machine Learning
PFI-RP:用于增强移动人工智能和个性化机器学习的弹性和节能内存芯片
- 批准号:
2345655 - 财政年份:2024
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
Enabling Innovative Space-driven Services for Energy Efficient Buildings and Climate Resilient Cities
为节能建筑和气候适应型城市提供创新的空间驱动服务
- 批准号:
10063705 - 财政年份:2023
- 资助金额:
$ 29.88万 - 项目类别:
EU-Funded
Resilient design of energy pile foundations toward zero carbon buildings
面向零碳建筑的能源桩基础弹性设计
- 批准号:
DP230102304 - 财政年份:2023
- 资助金额:
$ 29.88万 - 项目类别:
Discovery Projects
Collaborative Research: Implementation: Medium: Secure, Resilient Cyber-Physical Energy System Workforce Pathways via Data-Centric, Hardware-in-the-Loop Training
协作研究:实施:中:通过以数据为中心的硬件在环培训实现安全、有弹性的网络物理能源系统劳动力路径
- 批准号:
2320972 - 财政年份:2023
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
Collaborative Research: Implementation: Medium: Secure, Resilient Cyber-Physical Energy System Workforce Pathways via Data-Centric, Hardware-in-the-Loop Training
协作研究:实施:中:通过以数据为中心的硬件在环培训实现安全、有弹性的网络物理能源系统劳动力路径
- 批准号:
2320975 - 财政年份:2023
- 资助金额:
$ 29.88万 - 项目类别:
Standard Grant
CAREER: Leveraging physical properties of modern flash memory chips for resilient, secure, and energy-efficient edge storage systems
职业:利用现代闪存芯片的物理特性打造弹性、安全且节能的边缘存储系统
- 批准号:
2346853 - 财政年份:2023
- 资助金额:
$ 29.88万 - 项目类别:
Continuing Grant