SERT: Scale-free, Energy-aware, Resilient and Transparent Adaptation of CSE Applications to Mega-core Systems

SERT:CSE 应用程序对兆核系统的无标度、能源感知、弹性和透明适应

基本信息

  • 批准号:
    EP/M01147X/1
  • 负责人:
  • 金额:
    $ 122.82万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2015
  • 资助国家:
    英国
  • 起止时间:
    2015 至 无数据
  • 项目状态:
    已结题

项目摘要

Moore's Law and Dennard scaling have led to dramatic performance increases in microprocessors, the basis of modern supercomputers, which consist of clusters of nodes that include microprocessors and memory. This design is deeply embedded in parallel programming languages, the runtime systems that orchestrate parallel execution, and computational science applications.Some deviations from this simple, symmetric design have occurred over the years, but now we have pushed transistor scaling to the extent that simplicity is giving way to complex architectures. The absence of Dennard scaling, which has not held for about a decade, and the atomic dimensions of transistors have profound implications on the architecture of current and future supercomputers. Scalability limitations will arise from insufficient data access locality. Exascale systems will have up to 100x more cores and commensurately less memory space and bandwidth per core. However, in-situ data analysis, motivated by decreasing file system bandwidths will increase the memory footprints of scientific applications. Thus, we must improve per-core data access locality and reduce contention and interference for shared resources.Energy constraints will fundamentally limit the performance and reliability of future large-scale systems. These constraints lead many to predict a phenomenon of "dark silicon" in which half or more of the transistors on each chip must be powered down for safe operation. Low-power processor technologies based on sub-threshold or near-threshold voltage operation are a viable alternative. However, these techniques dramatically decrease the mean time to failure at scale and, thus, require new paradigms to sustain throughput and correctness.Non-deterministic performance variation will arise from design process variation that leads to asymmetric performance and power consumption in architecturally symmetric hardware components. The manifestations of the asymmetries are non-deterministic and can vary with small changes to system components or software. This performance variation produces non-deterministic, non-algorithmic load imbalance. Reliability limitations will stem from the massive number of system components, which proportionally reduces the mean-time-to-failure, but also from the component wear and from low-voltage operation, which introduces timing errors. Infrastructure-level power capping may also compromise application reliability or create severe load imbalances.The impact of these changes on technology will travel as a shockwave throughout the software stack. For decades, we have designed computational science applications based on very strict assumptions that performance is uniform and processors are reliable. In the future, hardware will behave unpredictably, at times erratically. Software must compensate for this behavior. Our research anticipates this future hardware landscape. Our ecosystem will combine binary adaptation, code refactoring, and approximate computation to prepare CSE applications. We will provide them with scale-freedom - the ability to run well at scale under dynamic execution conditions - with at most limited, platform-agnostic code refactoring. Our software will provide automatic load balancing and concurrency throttling to tame non-deterministic performance variations. Finally, our new form of user-controlled approximate computation will enable execution of CSE applications on hardware with low supply voltages, or any form of faulty hardware, by selectively dropping or tolerating erroneous computation that arises from unreliable execution, thus saving energy. Cumulatively, these tools will enable non-intrusive reengineering of major computational science libraries and applications (2DRMP, Code_Saturne, DL_POLY, LB3D) and prepare them for the next generation of UK supercomputers. The project partners with NAG a leading UK HPC software and service provider.
摩尔定律和登纳德缩放导致了微处理器性能的显著提高,微处理器是现代超级计算机的基础,它由包含微处理器和存储器的节点集群组成。这种设计深深嵌入到并行编程语言、编排并行执行的运行时系统和计算科学应用程序中。多年来,这种简单、对称的设计出现了一些偏差,但现在我们已经将晶体管的规模推进到了简单让位于复杂架构的程度。登纳德尺度的缺失(已经持续了大约10年)和晶体管的原子维度对当前和未来超级计算机的架构有着深远的影响。由于数据访问局部性不足,将产生可伸缩性限制。百亿亿级系统将拥有多达100倍的核心,每个核心的内存空间和带宽也相应减少。然而,由于减少文件系统带宽而进行的原位数据分析将增加科学应用程序的内存占用。因此,我们必须改进每核数据访问的局部性,减少共享资源的争用和干扰。能源限制将从根本上限制未来大型系统的性能和可靠性。这些限制导致许多人预测会出现“暗硅”现象,即每个芯片上一半或更多的晶体管必须关闭电源才能安全运行。基于亚阈值或近阈值电压操作的低功耗处理器技术是一种可行的替代方案。然而,这些技术大大减少了大规模失败的平均时间,因此需要新的范例来维持吞吐量和正确性。不确定的性能变化将产生于设计过程的变化,这种变化会导致架构对称硬件组件的性能和功耗不对称。不对称的表现是不确定的,可以随着系统组件或软件的微小变化而变化。这种性能变化会产生不确定的、非算法的负载不平衡。可靠性限制将源于大量的系统组件,这按比例减少了平均故障时间,但也来自组件磨损和低压操作,这会引入定时误差。基础设施级别的功率封顶也可能损害应用程序的可靠性或造成严重的负载不平衡。这些技术变化的影响将像冲击波一样传遍整个软件栈。几十年来,我们基于非常严格的假设来设计计算科学应用程序,即性能是统一的,处理器是可靠的。在未来,硬件的行为将不可预测,有时是不稳定的。软件必须补偿这种行为。我们的研究预测了未来的硬件格局。我们的生态系统将结合二进制自适应、代码重构和近似计算来准备CSE应用程序。我们将为它们提供规模自由——在动态执行条件下良好运行的能力——以及至多有限的、与平台无关的代码重构。我们的软件将提供自动负载平衡和并发限制,以驯服不确定的性能变化。最后,我们的新形式的用户控制的近似计算将使CSE应用程序在低电源电压的硬件上执行,或任何形式的故障硬件,通过选择性地放弃或容忍由不可靠的执行产生的错误计算,从而节省能源。累积起来,这些工具将使主要计算科学库和应用程序(2DRMP, Code_Saturne, DL_POLY, LB3D)的非侵入式重组成为可能,并为下一代英国超级计算机做好准备。该项目与NAG合作,NAG是英国领先的高性能计算软件和服务提供商。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
SCALO Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
适用于多线程工作负载的 SCALO 可扩展性感知并行编排
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures
低功耗 ARM 架构上预调节迭代求解器的性能和容错能力
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aliaga J.
  • 通讯作者:
    Aliaga J.
TwinPCG: Dual Thread Redundancy with forward Recovery for Preconditioned Conjugate Gradient Methods
TwinPCG:预条件共轭梯度法的双线程冗余和前向恢复
  • DOI:
    10.1109/cluster.2016.86
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dichev K
  • 通讯作者:
    Dichev K
DARE Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers
DARE 通过商品服务器上的时空应用程序弹性进行数据访问感知刷新
Energy-efficient localised rollback via data flow analysis and frequency scaling
通过数据流分析和频率缩放实现节能的局部回滚
  • DOI:
    10.1145/3236367.3236379
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dichev K
  • 通讯作者:
    Dichev K
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dimitrios Nikolopoulos其他文献

Modelling radon progeny concentration variations in thermal spas
  • DOI:
    10.1016/j.scitotenv.2006.11.017
  • 发表时间:
    2007-02-01
  • 期刊:
  • 影响因子:
  • 作者:
    Dimitrios Nikolopoulos;Efstratios Vogiannis
  • 通讯作者:
    Efstratios Vogiannis
CO2 and Radon Emissions as Precursors of Seismic Activity
  • DOI:
    10.1007/s41748-021-00229-2
  • 发表时间:
    2021-06-18
  • 期刊:
  • 影响因子:
    4.700
  • 作者:
    Simone D’Incecco;Ermioni Petraki;Georgios Priniotakis;Michail Papoutsidakis;Panayiotis Yannakopoulos;Dimitrios Nikolopoulos
  • 通讯作者:
    Dimitrios Nikolopoulos
Long-memory traces in $$\hbox {PM}_{10}$$ time series in Athens, Greece: investigation through DFA and R/S analysis
  • DOI:
    10.1007/s00703-020-00744-3
  • 发表时间:
    2020-05-28
  • 期刊:
  • 影响因子:
    2.100
  • 作者:
    Dimitrios Nikolopoulos;Konstantinos Moustris;Ermioni Petraki;Demetrios Cantzos
  • 通讯作者:
    Demetrios Cantzos
Investigation of the exposure to radon and progeny in the thermal spas of Loutraki (Attica-Greece): Results from measurements and modelling
  • DOI:
    10.1016/j.scitotenv.2009.09.057
  • 发表时间:
    2010-01-01
  • 期刊:
  • 影响因子:
  • 作者:
    Dimitrios Nikolopoulos;Efstratios Vogiannis;Ermioni Petraki;Athanasios Zisos;Anna Louizi
  • 通讯作者:
    Anna Louizi
Primary superficial femoral vein leiomyosarcoma: Report of a case
  • DOI:
    10.1007/s00595-010-4507-6
  • 发表时间:
    2011-10-04
  • 期刊:
  • 影响因子:
    1.600
  • 作者:
    Dimitrios Yfadopoulos;Dimitrios Nikolopoulos;Evanthia Novi;Antonios Psaroudakis
  • 通讯作者:
    Antonios Psaroudakis

Dimitrios Nikolopoulos的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dimitrios Nikolopoulos', 18)}}的其他基金

U.S.-Ireland R&D Partnership:CNS:Small:SWEET: Hardware and Software for Sustainable Wearable Edge Intelligence
美国-爱尔兰 R
  • 批准号:
    2315851
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Standard Grant
Heterogeneous Parallel and Distributed Computing with Java (HPDCJ)
使用 Java 进行异构并行和分布式计算 (HPDCJ)
  • 批准号:
    EP/M015750/1
  • 财政年份:
    2015
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Research Grant
Distributed Heterogeneous Vertically Integrated Energy Efficient Data Centres
分布式异构垂直集成节能数据中心
  • 批准号:
    EP/M015742/1
  • 财政年份:
    2015
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Research Grant
ENPOWER
赋能
  • 批准号:
    EP/L004232/1
  • 财政年份:
    2014
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Research Grant
Abstraction-Level Energy Accounting and Optimisation in Many-core Programming Languages
多核编程语言中的抽象级能源核算和优化
  • 批准号:
    EP/L000555/1
  • 财政年份:
    2013
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Research Grant
GEMSCLAIM: GreenEr Mobile Systems by Cross LAyer Integrated energy Management
GEMSCLAIM:跨层集成能源管理的 GreenEr 移动系统
  • 批准号:
    EP/K017594/1
  • 财政年份:
    2013
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Research Grant
CAREER: A Unified Framework for Multilevel Parallelization on Deep Computing Systems
职业:深度计算系统多级并行化的统一框架
  • 批准号:
    0715051
  • 财政年份:
    2006
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Continuing Grant
CAREER: A Unified Framework for Multilevel Parallelization on Deep Computing Systems
职业:深度计算系统多级并行化的统一框架
  • 批准号:
    0346867
  • 财政年份:
    2004
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于热量传递的传统固态发酵过程缩小(Scale-down)机理及调控
  • 批准号:
    22108101
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于Multi-Scale模型的轴流血泵瞬变流及空化机理研究
  • 批准号:
    31600794
  • 批准年份:
    2016
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目
针对Scale-Free网络的紧凑路由研究
  • 批准号:
    60673168
  • 批准年份:
    2006
  • 资助金额:
    25.0 万元
  • 项目类别:
    面上项目

相似海外基金

Scale-up of Coil Winding and Magnet Assembly Manufacturing Processes for a Rare Earth-Free Permanent Magnet Generator
无稀土永磁发电机的线圈绕组和磁体组装制造工艺的放大
  • 批准号:
    10059447
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Collaborative R&D
SBIR Phase II: Clinical scale and testing of the first virus-free precision gene edited cell therapy for veterinary oncology
SBIR II 期:第一个用于兽医肿瘤学的无病毒精准基因编辑细胞疗法的临床规模和测试
  • 批准号:
    2243587
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Cooperative Agreement
Towards a carbon-free future: Using underground storage of hydrogen in porous rocks to enable grid-scale energy storage
迈向无碳未来:利用地下多孔岩石中的氢储存来实现电网规模的能源储存
  • 批准号:
    2894612
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Studentship
Mid-Scale RI-2 Consortium: Compact X-ray Free-Electron Laser Project (CXFEL)
中型 RI-2 联盟:紧凑型 X 射线自由电子激光项目 (CXFEL)
  • 批准号:
    2153503
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Cooperative Agreement
EFFICIENT SCALE-UP OF IPS CELLS FOR AUTOLOGOUS CELL THERAPY WORKFLOW
高效扩大 IPS 细胞的自体细胞治疗工作流程
  • 批准号:
    10822298
  • 财政年份:
    2023
  • 资助金额:
    $ 122.82万
  • 项目类别:
NSF-DFG: Solvent-Free Manufacturing of Perovskite Large-Scale Electronics
NSF-DFG:钙钛矿大型电子产品的无溶剂制造
  • 批准号:
    2135937
  • 财政年份:
    2022
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Standard Grant
Large Scale Expansion of Mesenchymal Stromal Cells using Dissolvable Microcarriers and Animal Product Free Recombinant Protein Coatings
使用可溶性微载体和无动物产品的重组蛋白涂层大规模扩增间充质基质细胞
  • 批准号:
    569481-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
CDS&E: Accelerating Astrophysical Insight at Scale with Likelihood-Free Inference
CDS
  • 批准号:
    2206744
  • 财政年份:
    2022
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Continuing Grant
RAPID: Fare Free Public Transportation - A Full Scale Natural Experiment in Alexandria, Virginia
RAPID:免费公共交通 - 弗吉尼亚州亚历山大的全面自然实验
  • 批准号:
    2153689
  • 财政年份:
    2022
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Standard Grant
Free-space quantum communications at any scale
任何规模的自由空间量子通信
  • 批准号:
    2740406
  • 财政年份:
    2022
  • 资助金额:
    $ 122.82万
  • 项目类别:
    Studentship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了