Reliability, Performability and Scalability of Large-Scale Distributed Systems
大规模分布式系统的可靠性、性能和可扩展性
基本信息
- 批准号:9010240
- 负责人:
- 金额:$ 6.42万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:1990
- 资助国家:美国
- 起止时间:1990-07-01 至 1992-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Large-scale distributed multicomputer systems, consisting of several thousand processing elements, are rapidly demonstrating their potential as a low cost high performance supercomputer. Not only can these system speed-up program execution, but they also allow significantly larger problems to be addressed. A wide-spread use of these systems, however, in mission critical as well as commercial applications, depends on their demonstrated reliability, availability and scalability. The objective of this research project is to investigate the reliability, scalability and performability of large-scale distributed systems. As the number of elements in a system increases, the rate of failure of the system is expected to increase given a constant technology. Therefore system reliability and scalability are important considerations in the design of large-scale systems. The research will focus on two essential issues: the analysis of network reliability and performability, and the evaluation of techniques that can exploit the inherent redundancy of these systems. The network reliability analysis will examine the effects of multiple node and link failures on the connectivity of the network and on its communication bandwidth, investigating the probability of occurrence of network disconnection, saturation and communication bottlenecks. The inherent hardware redundancy of large-scale systems can be exploited to achieve a higher reliability, albeit, at the cost of a reduced computing power. The second objective will be to investigate the achievable performance/reliability tradeoff and system scalability using various redundancy schemes. The research is essentially analytical in nature but will rely on simulation techniques whenever an exact analytical evaluation is not feasible.
由数千个处理单元组成的大规模分布式多计算机系统正迅速显示出其作为低成本高性能超级计算机的潜力。这些系统不仅可以加速程序执行,而且还可以解决更大的问题。然而,这些系统在关键任务和商业应用中的广泛使用取决于它们表现出的可靠性、可用性和可扩展性。本研究项目的目的是研究大规模分布式系统的可靠性、可扩展性和可执行性。随着系统中元件数量的增加,在给定恒定技术的情况下,系统的故障率预计也会增加。因此,系统的可靠性和可扩展性是大型系统设计中的重要考虑因素。研究将集中在两个基本问题上:网络可靠性和可执行性的分析,以及对能够利用这些系统固有冗余的技术的评估。网络可靠性分析将检查多个节点和链路故障对网络连通性及其通信带宽的影响,调查网络断开、饱和和通信瓶颈的发生概率。可以利用大规模系统的固有硬件冗余来实现更高的可靠性,尽管代价是降低计算能力。第二个目标将是研究使用各种冗余方案的可实现的性能/可靠性折衷和系统可伸缩性。这项研究本质上是分析性的,但当准确的分析性评估不可行时,将依靠模拟技术。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Walid Najjar其他文献
High performance FPGA and GPU complex pattern matching over spatio-temporal streams
- DOI:
10.1007/s10707-014-0217-3 - 发表时间:
2014-08-26 - 期刊:
- 影响因子:2.600
- 作者:
Roger Moussalli;Ildar Absalyamov;Marcos R. Vieira;Walid Najjar;Vassilis J. Tsotras - 通讯作者:
Vassilis J. Tsotras
On the Hu 2003 Plasticity Criterion
- DOI:
10.1007/s11665-023-08700-z - 发表时间:
2023-09-12 - 期刊:
- 影响因子:2.000
- 作者:
Walid Najjar;Imed Ghaouss;Idriss Tiba;Philippe Dal Santo - 通讯作者:
Philippe Dal Santo
Walid Najjar的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Walid Najjar', 18)}}的其他基金
SHF:Small: Automatic Generation of Hardware Threads on Programmable Fabrics
SHF:Small:在可编程结构上自动生成硬件线程
- 批准号:
1219180 - 财政年份:2012
- 资助金额:
$ 6.42万 - 项目类别:
Standard Grant
CPA-CSA: Hardware Support for FPGA-Based Code Acceleration
CPA-CSA:基于 FPGA 代码加速的硬件支持
- 批准号:
0811416 - 财政年份:2008
- 资助金额:
$ 6.42万 - 项目类别:
Continuing Grant
SGER: Hardward/Software Partitioning for Multiprocessor and Multicore Acceleration
SGER:用于多处理器和多核加速的硬件/软件分区
- 批准号:
0745490 - 财政年份:2007
- 资助金额:
$ 6.42万 - 项目类别:
Standard Grant
相似海外基金
OTERA-III: Online Test Strategies for Reliable Reconfigurable Architectures - From Reliability to Guaranteed System Performability: A Multi-Layer Approach
OTERA-III:可靠可重构架构的在线测试策略 - 从可靠性到保证系统性能:多层方法
- 批准号:
182065442 - 财政年份:2010
- 资助金额:
$ 6.42万 - 项目类别:
Priority Programmes
Effiziente Analyseverfahren für die Performability-Bewertung verteilter Systeme
分布式系统性能评估的高效分析方法
- 批准号:
55928509 - 财政年份:2007
- 资助金额:
$ 6.42万 - 项目类别:
Research Grants
Effiziente Verifikation von Performability-Eigenschaften verteilter Systeme
分布式系统性能属性的高效验证
- 批准号:
5405376 - 财政年份:2003
- 资助金额:
$ 6.42万 - 项目类别:
Research Grants
Performability and reliability of communications networks
通信网络的性能和可靠性
- 批准号:
579-1993 - 财政年份:1996
- 资助金额:
$ 6.42万 - 项目类别:
Discovery Grants Program - Individual
Conference on Performability in Hardware/Software/Human Computing Systems; New Brunswick, NJ, December 14-15, 1995
硬件/软件/人类计算系统的性能会议;
- 批准号:
9414708 - 财政年份:1995
- 资助金额:
$ 6.42万 - 项目类别:
Standard Grant
Performability and reliability of communications networks
通信网络的性能和可靠性
- 批准号:
579-1993 - 财政年份:1995
- 资助金额:
$ 6.42万 - 项目类别:
Discovery Grants Program - Individual
Performability and reliability of communications networks
通信网络的性能和可靠性
- 批准号:
579-1993 - 财政年份:1994
- 资助金额:
$ 6.42万 - 项目类别:
Discovery Grants Program - Individual
Research Planning Grant: Adaptive Performability Management
研究规划补助金:适应性绩效管理
- 批准号:
9796223 - 财政年份:1994
- 资助金额:
$ 6.42万 - 项目类别:
Standard Grant
Performability and reliability of communications networks
通信网络的性能和可靠性
- 批准号:
579-1993 - 财政年份:1993
- 资助金额:
$ 6.42万 - 项目类别:
Discovery Grants Program - Individual
Research Planning Grant: Adaptive Performability Management
研究规划补助金:适应性绩效管理
- 批准号:
9308912 - 财政年份:1993
- 资助金额:
$ 6.42万 - 项目类别:
Standard Grant