XPS:CLCCA:LigHTS: Lagging-Hardware Tolerant Systems" in the system.
系统中的“XPS:CLCCA:LigHTS:滞后硬件容忍系统”。
基本信息
- 批准号:1336580
- 负责人:
- 金额:$ 74.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-15 至 2017-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
With the advent of scalable parallel computing, thousands of devices are connected and managed collectively. This era is confronted with a new challenge: performance failure; systems often perform worse than expected due to large-scale management issues such as hardware failures, software bugs, and configuration mistakes. This project targets one overlooked cause of performance failure: "lagging hardware" -- hardware whose performance degrades significantly compared to its specification. Many reports indicate that a single lagging hardware can easily cascade and make the performance of a whole cluster collapse. Here, parallelism is unexploited, productivity is reduced, the system is underutilized, and energy is wasted. The goal of the LigHTS project is to transform computing systems into Lagging-Hardware Tolerant Systems. The LigHTS project will bring many direct benefits to the society; users from many areas (science, healthcare, business, education, military, and government) increasingly use large-scale storage and computation services. Here, predictable performance is a key to success, and in this context lagging-hardware tolerant computing is a critical ingredient. The LigHTS project consists of three major objectives. The first is lagging-hardware data analysis and instrumentation. To improve the robustness of future parallel systems, it is crucial to study lagging characteristics exhibited by modern hardware and to devise new instrumentation methodologies that can collect cases of lagging hardware in deployment. The second is lagging-failure system analysis. It is important to rigorously analyze the impact of lagging hardware (including disk, network, processor) to currently deployed systems. The results will unearth design flaws and provide valuable reevaluations of how deployed systems should evolve. The last is LigHTS principles, design, and implementation. There is a need to establish foundational principles of lagging-hardware tolerant computing and apply the principles in building prototypes of cross-layer LigHTS systems spanning distributed storage, computing framework, operating and runtime systems.
随着可扩展并行计算的到来,数以千计的设备被集中连接和管理。在这个时代,面临着一个新的挑战:性能故障;由于硬件故障、软件错误和配置错误等大规模管理问题,系统的性能经常低于预期。这个项目针对一个被忽视的性能故障原因:落后的硬件--与其规范相比其性能显著下降的硬件。许多报告表明,单个落后的硬件很容易级联,导致整个集群的性能崩溃。在这里,并行性未被开发,生产力降低,系统未得到充分利用,能源被浪费。LIGHTS项目的目标是将计算系统转变为落后的硬件容忍系统。LIGHTS项目将给社会带来许多直接的好处;来自许多领域(科学、医疗、商业、教育、军事和政府)的用户越来越多地使用大规模的存储和计算服务。在这里,可预测的性能是成功的关键,在这种背景下,落后的硬件容忍计算是一个关键因素。LIGHTS项目由三个主要目标组成。第一个目标是滞后--硬件数据分析和仪器。为了提高未来并行系统的健壮性,关键是研究现代硬件所表现出的滞后特性,并设计新的工具方法来收集部署中滞后硬件的情况。第二是滞后故障系统分析。重要的是严格分析滞后硬件(包括磁盘、网络、处理器)对当前部署的系统的影响。其结果将揭示设计缺陷,并为部署的系统应该如何发展提供有价值的重新评估。最后是LIGHTS原则、设计和实现。需要建立滞后硬件容忍计算的基本原则,并将这些原则应用于构建跨越分布式存储、计算框架、操作和运行时系统的跨层LIGHTS系统原型。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Haryadi Gunawi其他文献
Haryadi Gunawi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Haryadi Gunawi', 18)}}的其他基金
Collaborative Research: PPoSS: LARGE: ScaleStuds: Foundations for Correctness Checkability and Performance Predictability of Systems at Scale
合作研究:PPoSS:大型:ScaleStuds:大规模系统正确性可检查性和性能可预测性的基础
- 批准号:
2119184 - 财政年份:2021
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
PPoSS: Planning: CP2: Towards Systems Correctness Checkability and Performance Predictability at Scale
PPoSS:规划:CP2:实现大规模系统正确性可检查性和性能可预测性
- 批准号:
2028427 - 财政年份:2020
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
USENIX FAST 2017 NSF Student Travel Support
USENIX FAST 2017 NSF 学生旅行支持
- 批准号:
1727380 - 财政年份:2017
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
CSR: Medium:Combating Distributed Concurrency Bugs in Cloud Systems
CSR:中:对抗云系统中的分布式并发错误
- 批准号:
1563956 - 财政年份:2016
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CSR: Small: BreezeFS: File System Transformation for Cloud and Multistore Era
CSR:小型:BreezeFS:云和多存储时代的文件系统转型
- 批准号:
1526304 - 财政年份:2015
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
CAREER: DrCloud: Drill-Ready Cloud Computing
职业:DrCloud:可练习的云计算
- 批准号:
1350499 - 财政年份:2014
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
DC: Small: Collaborative Research: DARE: Declarative and Scalable Recovery
DC:小型:协作研究:DARE:声明式和可扩展的恢复
- 批准号:
1321958 - 财政年份:2012
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
DC: Small: Collaborative Research: DARE: Declarative and Scalable Recovery
DC:小型:协作研究:DARE:声明式和可扩展的恢复
- 批准号:
1016924 - 财政年份:2010
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
相似海外基金
XPS: CLCCA: Scalable Parallelism for Irregular and Graph Applications
XPS:CLCCA:不规则和图形应用程序的可扩展并行性
- 批准号:
1335466 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
XPS: CLCCA (XPS: DSD) Future Extreme Scale Frameworks using DSL and ERTS
XPS:CLCCA(XPS:DSD)使用 DSL 和 ERTS 的未来极端规模框架
- 批准号:
1337145 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
XPS: CLCCA: On the Hunt for Correctness and Performance Bugs in Large-scale Programs
XPS:CLCCA:寻找大型程序中的正确性和性能错误
- 批准号:
1337158 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
XPS: CLCCA: Improving Parallel Program Reliability Through Novel Approaches to Precise Dynamic Data Race Detection
XPS:CLCCA:通过精确动态数据竞争检测的新方法提高并行程序可靠性
- 批准号:
1337174 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
XPS: CLCCA: Enhancing the Programmability of Heterogeneous Manycore Systems
XPS:CLCCA:增强异构众核系统的可编程性
- 批准号:
1337147 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
XPS: CLCCA: Allocating Heterogeneous Datacenter Hardware to Strategic Agents
XPS:CLCCA:将异构数据中心硬件分配给战略代理
- 批准号:
1337215 - 财政年份:2013
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant