SERT: Scale-free, Energy-aware, Resilient and Transparent Adaptation of CSE Applications to Mega-core Systems
SERT:CSE 应用程序对兆核系统的无标度、能源感知、弹性和透明适应
基本信息
- 批准号:EP/M01147X/1
- 负责人:
- 金额:$ 122.82万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2015
- 资助国家:英国
- 起止时间:2015 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Moore's Law and Dennard scaling have led to dramatic performance increases in microprocessors, the basis of modern supercomputers, which consist of clusters of nodes that include microprocessors and memory. This design is deeply embedded in parallel programming languages, the runtime systems that orchestrate parallel execution, and computational science applications.Some deviations from this simple, symmetric design have occurred over the years, but now we have pushed transistor scaling to the extent that simplicity is giving way to complex architectures. The absence of Dennard scaling, which has not held for about a decade, and the atomic dimensions of transistors have profound implications on the architecture of current and future supercomputers. Scalability limitations will arise from insufficient data access locality. Exascale systems will have up to 100x more cores and commensurately less memory space and bandwidth per core. However, in-situ data analysis, motivated by decreasing file system bandwidths will increase the memory footprints of scientific applications. Thus, we must improve per-core data access locality and reduce contention and interference for shared resources.Energy constraints will fundamentally limit the performance and reliability of future large-scale systems. These constraints lead many to predict a phenomenon of "dark silicon" in which half or more of the transistors on each chip must be powered down for safe operation. Low-power processor technologies based on sub-threshold or near-threshold voltage operation are a viable alternative. However, these techniques dramatically decrease the mean time to failure at scale and, thus, require new paradigms to sustain throughput and correctness.Non-deterministic performance variation will arise from design process variation that leads to asymmetric performance and power consumption in architecturally symmetric hardware components. The manifestations of the asymmetries are non-deterministic and can vary with small changes to system components or software. This performance variation produces non-deterministic, non-algorithmic load imbalance. Reliability limitations will stem from the massive number of system components, which proportionally reduces the mean-time-to-failure, but also from the component wear and from low-voltage operation, which introduces timing errors. Infrastructure-level power capping may also compromise application reliability or create severe load imbalances.The impact of these changes on technology will travel as a shockwave throughout the software stack. For decades, we have designed computational science applications based on very strict assumptions that performance is uniform and processors are reliable. In the future, hardware will behave unpredictably, at times erratically. Software must compensate for this behavior. Our research anticipates this future hardware landscape. Our ecosystem will combine binary adaptation, code refactoring, and approximate computation to prepare CSE applications. We will provide them with scale-freedom - the ability to run well at scale under dynamic execution conditions - with at most limited, platform-agnostic code refactoring. Our software will provide automatic load balancing and concurrency throttling to tame non-deterministic performance variations. Finally, our new form of user-controlled approximate computation will enable execution of CSE applications on hardware with low supply voltages, or any form of faulty hardware, by selectively dropping or tolerating erroneous computation that arises from unreliable execution, thus saving energy. Cumulatively, these tools will enable non-intrusive reengineering of major computational science libraries and applications (2DRMP, Code_Saturne, DL_POLY, LB3D) and prepare them for the next generation of UK supercomputers. The project partners with NAG a leading UK HPC software and service provider.
摩尔定律和丹纳德缩放量导致微处理器的性能急剧提高,这是现代超级计算机的基础,这些基础由包括微处理器和内存在内的节点组成。该设计深深地嵌入了并行编程语言中,这些系统的运行时系统和计算科学的应用程序。多年来,与这种简单的,对称的设计发生了一些偏差,但是现在我们已经将晶体管扩展推向了简单性的范围,即简单性将其赋予复杂的建筑。丹纳德缩放的缺乏,大约十年来一直没有,晶体管的原子维度对当前和未来的超级计算机的架构具有深远的影响。可伸缩性限制将来自数据访问区域不足。 Exascale系统将具有多达100倍的核心,并且每个核心的记忆空间和带宽较小。但是,通过减少文件系统带宽动机的原位数据分析将增加科学应用程序的内存足迹。因此,我们必须改善每核数据访问区域,并减少对共享资源的争执和干扰。能源约束将从根本上限制未来大规模系统的性能和可靠性。这些约束导致许多人预测“深色硅”的现象,在该现象中,每个芯片上的一半或更多的晶体管都必须供电以进行安全操作。基于子阈值或接近阈值电压操作的低功率处理器技术是可行的替代方法。但是,这些技术极大地减少了大规模失败的平均时间,因此需要新的范式来维持吞吐量和正确性。不确定性的性能变化将来自设计过程变化,这会导致建筑型对称性硬件组件的不对称性能和功耗。不对称的表现是非确定性的,并且会随着系统组件或软件的微小变化而变化。这种性能变化会导致非确定性的非算力负载失衡。可靠性限制将源于系统组件数量的数量,这会按比例降低平均时间到失败,还来自组件磨损和低压操作,从而引入了时序误差。基础架构级的电源封盖也可能损害应用程序的可靠性或造成严重的负载失衡。这些变化对技术的影响将作为整个软件堆栈中的冲击波传播。几十年来,我们一直基于非常严格的性能统一并且处理器可靠的严格假设设计了计算科学应用程序。将来,硬件有时会不正当地行事。软件必须弥补这一行为。我们的研究预测了未来的硬件景观。我们的生态系统将结合二元适应,代码重构和近似计算,以准备CSE应用程序。我们将为他们提供比例 - 自由度 - 在动态执行条件下在规模上良好运行的能力 - 最多有限的平台不可固定代码重构。我们的软件将提供自动负载平衡和并发节点,以驯服非确定性的性能变化。最后,我们的新形式的用户控制的近似计算将通过选择性删除或容忍不可靠的执行,从而节省能量,从而使CSE应用程序执行具有低电源电压或任何形式的错误硬件的硬件或任何形式的错误硬件。累积地,这些工具将使主要的计算科学库和应用程序无侵入性重新设计(2DRMP,code_saturne,dl_poly,lb3d),并为下一代英国超级计算机做准备。该项目与NAG合作是英国领先的HPC软件和服务提供商。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
SCALO Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
适用于多线程工作负载的 SCALO 可扩展性感知并行编排
- DOI:10.1145/3158643
- 发表时间:2017
- 期刊:
- 影响因子:1.6
- 作者:Georgakoudis G
- 通讯作者:Georgakoudis G
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures
低功耗 ARM 架构上预调节迭代求解器的性能和容错能力
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:Aliaga J.
- 通讯作者:Aliaga J.
TwinPCG: Dual Thread Redundancy with forward Recovery for Preconditioned Conjugate Gradient Methods
TwinPCG:预条件共轭梯度法的双线程冗余和前向恢复
- DOI:10.1109/cluster.2016.86
- 发表时间:2016
- 期刊:
- 影响因子:0
- 作者:Dichev K
- 通讯作者:Dichev K
DARE Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers
DARE 通过商品服务器上的时空应用程序弹性进行数据访问感知刷新
- DOI:10.1177/1094342017718612
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:Chalios C
- 通讯作者:Chalios C
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Dimitrios Nikolopoulos其他文献
Modelling radon progeny concentration variations in thermal spas
- DOI:
10.1016/j.scitotenv.2006.11.017 - 发表时间:
2007-02-01 - 期刊:
- 影响因子:
- 作者:
Dimitrios Nikolopoulos;Efstratios Vogiannis - 通讯作者:
Efstratios Vogiannis
Investigation of the exposure to radon and progeny in the thermal spas of Loutraki (Attica-Greece): Results from measurements and modelling
- DOI:
10.1016/j.scitotenv.2009.09.057 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:
- 作者:
Dimitrios Nikolopoulos;Efstratios Vogiannis;Ermioni Petraki;Athanasios Zisos;Anna Louizi - 通讯作者:
Anna Louizi
Parallel Islands: A Parallel Computing Educational Video Game
平行群岛:并行计算教育视频游戏
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Melissa Cameron;Margaret Ellis;Dimitrios Nikolopoulos - 通讯作者:
Dimitrios Nikolopoulos
Study of indoor radon and radon in drinking water in Greece and Cyprus: Implications to exposure and dose
- DOI:
10.1016/j.radmeas.2008.03.043 - 发表时间:
2008-08-01 - 期刊:
- 影响因子:
- 作者:
Dimitrios Nikolopoulos;Anna Louizi - 通讯作者:
Anna Louizi
Singular spectral and control chart analysis of soil radon and thoron time series for forecasting seismic activities
- DOI:
10.1016/j.jastp.2023.106108 - 发表时间:
2023-08-01 - 期刊:
- 影响因子:
- 作者:
Awais Rasheed;Muhammad Osama;Dimitrios Nikolopoulos;Muhammad Rafique - 通讯作者:
Muhammad Rafique
Dimitrios Nikolopoulos的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Dimitrios Nikolopoulos', 18)}}的其他基金
U.S.-Ireland R&D Partnership:CNS:Small:SWEET: Hardware and Software for Sustainable Wearable Edge Intelligence
美国-爱尔兰 R
- 批准号:
2315851 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
Standard Grant
Heterogeneous Parallel and Distributed Computing with Java (HPDCJ)
使用 Java 进行异构并行和分布式计算 (HPDCJ)
- 批准号:
EP/M015750/1 - 财政年份:2015
- 资助金额:
$ 122.82万 - 项目类别:
Research Grant
Distributed Heterogeneous Vertically Integrated Energy Efficient Data Centres
分布式异构垂直集成节能数据中心
- 批准号:
EP/M015742/1 - 财政年份:2015
- 资助金额:
$ 122.82万 - 项目类别:
Research Grant
Abstraction-Level Energy Accounting and Optimisation in Many-core Programming Languages
多核编程语言中的抽象级能源核算和优化
- 批准号:
EP/L000555/1 - 财政年份:2013
- 资助金额:
$ 122.82万 - 项目类别:
Research Grant
GEMSCLAIM: GreenEr Mobile Systems by Cross LAyer Integrated energy Management
GEMSCLAIM:跨层集成能源管理的 GreenEr 移动系统
- 批准号:
EP/K017594/1 - 财政年份:2013
- 资助金额:
$ 122.82万 - 项目类别:
Research Grant
CAREER: A Unified Framework for Multilevel Parallelization on Deep Computing Systems
职业:深度计算系统多级并行化的统一框架
- 批准号:
0715051 - 财政年份:2006
- 资助金额:
$ 122.82万 - 项目类别:
Continuing Grant
CAREER: A Unified Framework for Multilevel Parallelization on Deep Computing Systems
职业:深度计算系统多级并行化的统一框架
- 批准号:
0346867 - 财政年份:2004
- 资助金额:
$ 122.82万 - 项目类别:
Continuing Grant
相似国自然基金
基于图像模型绘制的大规模场景自由可量测全景再现
- 批准号:41401522
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
关于大规模退化问题的滤子算法研究
- 批准号:11101281
- 批准年份:2011
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
针对Scale-Free网络的紧凑路由研究
- 批准号:60673168
- 批准年份:2006
- 资助金额:25.0 万元
- 项目类别:面上项目
语义Web的无尺度网络模型及高性能语义搜索算法研究
- 批准号:60503018
- 批准年份:2005
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
探讨复杂动力网络的同步能力和鲁棒性
- 批准号:60304017
- 批准年份:2003
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Scale-up of Coil Winding and Magnet Assembly Manufacturing Processes for a Rare Earth-Free Permanent Magnet Generator
无稀土永磁发电机的线圈绕组和磁体组装制造工艺的放大
- 批准号:
10059447 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
Collaborative R&D
A next-generation extendable simulation environment for affordable, accurate, and efficient free energy simulations
下一代可扩展模拟环境,可实现经济、准确且高效的自由能源模拟
- 批准号:
10638121 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
Label-free single-cell imaging for quality control of cardiomyocyte biomanufacturing
用于心肌细胞生物制造质量控制的无标记单细胞成像
- 批准号:
10675976 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
SBIR Phase II: Clinical scale and testing of the first virus-free precision gene edited cell therapy for veterinary oncology
SBIR II 期:第一个用于兽医肿瘤学的无病毒精准基因编辑细胞疗法的临床规模和测试
- 批准号:
2243587 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
Cooperative Agreement
Towards a carbon-free future: Using underground storage of hydrogen in porous rocks to enable grid-scale energy storage
迈向无碳未来:利用地下多孔岩石中的氢储存来实现电网规模的能源储存
- 批准号:
2894612 - 财政年份:2023
- 资助金额:
$ 122.82万 - 项目类别:
Studentship