Tolerating faults in interconnection networks for parallel computing

并行计算互连网络中的容错

基本信息

  • 批准号:
    EP/G010587/1
  • 负责人:
  • 金额:
    $ 35.07万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2009
  • 资助国家:
    英国
  • 起止时间:
    2009 至 无数据
  • 项目状态:
    已结题

项目摘要

In distributed-memory multiprocessors, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy , and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. For instance, we would like our interconnection network to be symmetric (to make programming and analysis easier), have small diameter (to lessen message-passing latency), be recursively decomposable (to aid scalability), be highly connected (to improve fault-tolerance and reliability), be regular of low degree (to lessen communication overheads and design complexity), support rapid and easy inter-processor communication, support the simulation of other machines based on other topologies, and so on. These properties all give rise to improved computational performance. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. When building distributed-memory multiprocessors with massive numbers of processors, some capacity for fault-tolerance is required, for one would still wish the machine to be operative under (a limited number of) processor or link faults. As to what one requires in terms of fault-tolerance depends upon the context, but the minimal requirement is that the (non-faulty portion of the) interconnection network should remain connected. However, usually more is required. Other important properties relevant to parallel computing include Hamiltonicity properties, for the existence of Hamiltonian cycles in networks is of crucial importance, given the ubiquity of such cycles as data structures in many distributed algorithms (they are primarily used to facilitate message-passing). Not only is the existence of Hamiltonian cycles of great importance but also the existence of Hamiltonian paths, and more generally the existence of cycles and paths of different lengths. The existence of Hamiltonian (or, at least, long) paths is extremely useful as we regularly need to simulate linear-array computations in distributed-memory multiprocessors; having a long path allows us to cater for such simulations where there are many different array-lengths involved in the simulations. In addition, given the ubiquity of cycle-based computations and algorithms in parallel computation, not only is the simulation of linear-array-based computations important but so is the simulation of cycle-based computations (of varying lengths).The research in this proposal is all about the toleration of faults in interconnection networks. There are three threads to the research: the study of the existence of paths and cycles (of varying lengths) in various interconnection networks under conditional fault assumptions (that is, asumptions on the distributions of the faults in the network); the study of fault-tolerance in Optical Transpose Interconnect System (OTIS) networks; and the distributed construction of embedded structures within a faulty network.
在分布式内存多处理器中,抑制更快全局计算的主要因素是处理器间通信。通信取决于互连网络的拓扑结构、路由机制、流量控制策略和交换方法。我们关注的是与互连网络的拓扑结构有关的问题。如何连接分布式内存多处理器中的处理器是一个基本的设计决策。有许多而且往往相互矛盾的考虑因素需要牢记。例如,我们希望互连网络是对称的(使编程和分析更容易),直径小(减少消息传递延迟),可递归分解(以提高可扩展性),高度连接(提高容错性和可靠性),规则度低(以减少通信开销和设计复杂性),支持快速和简单的处理器间通信,支持基于其他拓扑的其他机器的仿真,这些性质都提高了计算性能。然而,并不存在一个在所有方面都是最佳的互连网络,必须作出权衡。已经提出了大量的互连网络,其中每个网络具有一些好的(拓扑)属性和一些不太好的。当构建具有大量处理器的分布式内存多处理器时,需要一定的容错能力,因为人们仍然希望机器在(有限数量的)处理器或链路故障下运行。至于容错方面的要求取决于上下文,但最低要求是互连网络(的非故障部分)应保持连接。但通常需要更多。与并行计算相关的其他重要性质包括Hamilton性性质,因为网络中Hamilton圈的存在至关重要,因为这种圈在许多分布式算法中作为数据结构无处不在(它们主要用于促进消息传递)。不仅是存在的哈密尔顿圈非常重要,但也存在的哈密尔顿道路,更普遍的存在的循环和道路的不同长度。当我们经常需要在分布式内存多处理器中模拟线性阵列计算时,哈密顿路径(或者至少是长路径)的存在是非常有用的;有一个长路径允许我们满足这样的模拟,其中有许多不同的阵列长度参与模拟。此外,由于基于循环的计算和算法在并行计算中的普遍性,不仅基于线性阵列的计算的模拟是重要的,而且基于循环的计算(不同长度)的模拟也是重要的。研究的思路有三个:在条件故障假设下(即假设网络中故障的分布),研究各种互连网络中路径和环(不同长度)的存在性;研究光转置互连系统(奥蒂斯)网络的容错性;以及在故障网络中嵌入结构的分布式构造。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A general technique to establish the asymptotic conditional diagnosability of interconnection networks
  • DOI:
    10.1016/j.tcs.2012.05.015
  • 发表时间:
    2012-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    I. A. Stewart
  • 通讯作者:
    I. A. Stewart
On the Computational Complexity of Routing in Faulty k-ary n-Cubes and Hypercubes
  • DOI:
    10.1142/s012962641250003x
  • 发表时间:
    2012-04
  • 期刊:
  • 影响因子:
    0
  • 作者:
    I. A. Stewart
  • 通讯作者:
    I. A. Stewart
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Iain Stewart其他文献

Commonly prescribed medications and risk of pneumonia and all-cause mortality in people with idiopathic pulmonary fibrosis: a UK population-based cohort study
  • DOI:
    10.1186/s41479-024-00155-7
  • 发表时间:
    2025-01-25
  • 期刊:
  • 影响因子:
    6.200
  • 作者:
    Ann D. Morgan;Georgie M. Massen;Hannah R. Whittaker;Iain Stewart;Gisli Jenkins;Peter M. George;Jennifer K. Quint
  • 通讯作者:
    Jennifer K. Quint
Raymond Aron and Liberal Thought in the Twentieth Century
雷蒙德·阿伦与二十世纪的自由主义思想
  • DOI:
    10.1017/9781108695879
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    4.2
  • 作者:
    Iain Stewart
  • 通讯作者:
    Iain Stewart
The North Rockies Mountain Snowmobilers in the Absence of a Daily Public Avalanche Bulletin
没有每日公共雪崩公告的北落基山雪地摩托
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Duncan;Iain Stewart
  • 通讯作者:
    Iain Stewart
Conducting a SUHI study
进行 SUHI 研究
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Iain Stewart;G. Mills
  • 通讯作者:
    G. Mills
An iterated search for influence from the future on the Large Hadron Collider
反复寻找未来对大型强子对撞机的影响
  • DOI:
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Iain Stewart
  • 通讯作者:
    Iain Stewart

Iain Stewart的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Iain Stewart', 18)}}的其他基金

ALGOUK - A Network for Algorithms and Complexity in the UK
ALGOUK - 英国的算法和复杂性网络
  • 批准号:
    EP/R005613/1
  • 财政年份:
    2017
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Research Grant
Interconnection Networks: Practice unites with Theory (INPUT)
互连网络:实践与理论相结合(输入)
  • 批准号:
    EP/K015680/1
  • 财政年份:
    2013
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Research Grant
Quantified Constraints and Generalisations
量化约束和概括
  • 批准号:
    EP/G020604/1
  • 财政年份:
    2009
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Research Grant
Finite and Algorithmic Model Theory
有限和算法模型理论
  • 批准号:
    EP/D056853/1
  • 财政年份:
    2006
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Research Grant

相似国自然基金

制冷系统故障诊断关键问题的定量研究
  • 批准号:
    50876059
  • 批准年份:
    2008
  • 资助金额:
    30.0 万元
  • 项目类别:
    面上项目

相似海外基金

CAREER: Strengthening the Theoretical Foundations of Federated Learning: Utilizing Underlying Data Statistics in Mitigating Heterogeneity and Client Faults
职业:加强联邦学习的理论基础:利用底层数据统计来减轻异构性和客户端故障
  • 批准号:
    2340482
  • 财政年份:
    2024
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Continuing Grant
Postdoctoral Fellowship: EAR-PF: To roll, flow, or fracture - that is the question: Investigating the mechanisms behind friction and the stability of faults
博士后奖学金:EAR-PF:滚动、流动或断裂 - 这就是问题:研究摩擦和断层稳定性背后的机制
  • 批准号:
    2305630
  • 财政年份:
    2024
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Fellowship Award
Attent- an advanced edge AI system that is able to detect electrical faults before they happen.
Attent——一种先进的边缘人工智能系统,能够在电气故障发生之前检测到它们。
  • 批准号:
    10114569
  • 财政年份:
    2024
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Collaborative R&D
Center Operations: The Coupled Evolution of Earthquakes, Faults, and Geohazards of the San Andreas Fault System
中心运作:圣安德烈亚斯断层系统地震、断层和地质灾害的耦合演化
  • 批准号:
    2225216
  • 财政年份:
    2023
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Cooperative Agreement
Collaborative Research: Seismic cycles and earthquake nucleation on heterogeneous faults: Large-scale laboratory experiments, numerical simulations, and Whillans ice stream
合作研究:非均质断层上的地震周期和地震成核:大规模实验室实验、数值模拟和惠兰斯冰流
  • 批准号:
    2240375
  • 财政年份:
    2023
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Continuing Grant
Illuminating the seismogenic zones of large, hazardous faults with seismic arrays
用地震台阵照亮大型危险断层的震源区
  • 批准号:
    NE/W008289/1
  • 财政年份:
    2023
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Fellowship
Collaborative Research: Seismic cycles and earthquake nucleation on heterogeneous faults: Large-scale laboratory experiments, numerical simulations, and Whillans ice stream
合作研究:非均质断层上的地震周期和地震成核:大规模实验室实验、数值模拟和惠兰斯冰流
  • 批准号:
    2240376
  • 财政年份:
    2023
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Continuing Grant
Faults and fluids in accretionary orogens
增生造山带的断层和流体
  • 批准号:
    RGPIN-2021-03318
  • 财政年份:
    2022
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Discovery Grants Program - Individual
External visual observers for understanding manufacturing operations & predicting faults
用于了解制造运营的外部视觉观察员
  • 批准号:
    577810-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Alliance Grants
Development of the new method to detect water-conducting faults and fractures by high-precision gas concentration mapping
开发高精度气体浓度测绘检测导水断层和裂缝的新方法
  • 批准号:
    22K05011
  • 财政年份:
    2022
  • 资助金额:
    $ 35.07万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了