CAREER: Storage-Aware Fault Tolerance
职业:存储感知容错
基本信息
- 批准号:2339784
- 负责人:
- 金额:$ 69.96万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-02-15 至 2029-01-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Fault-tolerant storage systems are at the heart of modern datacenters. These systems ensure that large-scale services (e.g., search, social networking) and critical services (e.g., e-commerce, healthcare) can reliably access essential data, even in the face of failures. However, a pressing concern when designing fault-tolerant storage systems is that their strong fault-tolerance guarantees come at the cost of performance. For instance, a fault-tolerant key-value store built using existing approaches can perform up to 20x worse than a stand-alone version of the store (that does not tolerate failures). This project aims to build fault-tolerant storage systems that closely approximate the performance of fault-intolerant (non-replicated) storage servers. The project will achieve this goal by systematically rethinking widely used fault-tolerance paradigms. This project will develop novel fault-tolerance protocols and abstractions and build new practical systems. In particular, it will first develop a new replication protocol optimized for modern storage devices. Second, it will explore a novel CPU-free replication approach that unlocks the full potential of remote direct memory access (RDMA). Third, it will realize a new fault-tolerance architecture tailored for emerging disaggregated datacenters. Finally, it will develop a new shared log abstraction for storage applications. The solutions developed in this project will enable the development of reliable and performant systems, eliminating the need to make compromises that threaten the data safety of critical applications. The effort will significantly contribute to education and outreach through new course offerings to introduce students to distributed systems research and hands-on labs to equip students to use modern hardware. The project will also broaden participation in computing through doctoral workshops and engagement in undergraduate research. The project will also bring distributed systems to a broader audience (including K-12 students) through new interactive frameworks. All the artifacts from the project will be made openly available with necessary documentation for ease of use. Finally, the PI will collaborate with industry partners to implement project results within real-world systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
容错存储系统是现代数据中心的核心。这些系统确保大规模服务(例如,搜索、社交网络)和关键服务(例如,电子商务、医疗保健)可以可靠地访问基本数据,即使在出现故障的情况下也是如此。然而,在设计容错存储系统时,一个紧迫的问题是,它们强大的容错保证是以性能为代价的。例如,使用现有方法构建的容错键值存储的性能可能比独立版本的存储(不容忍故障)差20倍。该项目旨在构建容错存储系统,该系统的性能非常接近不容错(非复制)存储服务器。该项目将通过系统地重新思考广泛使用的容错模式来实现这一目标。该项目将开发新的容错协议和抽象,并建立新的实用系统。特别是,它将首先开发一种针对现代存储设备优化的新复制协议。第二,它将探索一种新的无CPU复制方法,释放远程直接内存访问(RDMA)的全部潜力。第三,它将实现一个新的容错架构,为新兴的分散式数据中心量身定制。最后,它将为存储应用程序开发一个新的共享日志抽象。该项目中开发的解决方案将使可靠和高性能系统的开发成为可能,消除了对威胁关键应用程序数据安全的妥协的需要。这一努力将通过新的课程设置为教育和推广做出重大贡献,向学生介绍分布式系统研究和动手实验室,使学生能够使用现代硬件。该项目还将通过博士研讨会和参与本科生研究来扩大对计算的参与。该项目还将通过新的交互式框架将分布式系统带给更广泛的受众(包括K-12学生)。项目中的所有工件都将公开提供,并提供必要的文档以便于使用。最后,PI将与行业合作伙伴合作,在实际系统中实施项目成果。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aishwarya Ganesan其他文献
Fault-Tolerance, Fast and Slow: Exploiting Failure Asynchrony in Distributed Systems
容错,快速和慢速:利用分布式系统中的故障异步
- DOI:
10.5555/3291168.3291197 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Ramnatthan Alagappan;Aishwarya Ganesan;Jing Liu;Andrea C. Arpaci;Remzi H. Arpaci - 通讯作者:
Remzi H. Arpaci
Protocol-Aware Recovery for Consensus-Based Storage
基于共识的存储的协议感知恢复
- DOI:
10.1109/icdew.2011.5767656 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Ramnatthan Alagappan;Aishwarya Ganesan;Eric Lee;Aws Albarghouthi;Vijay Chidambaram;Andrea C. Arpaci;Remzi H. Arpaci - 通讯作者:
Remzi H. Arpaci
Physical Analytics: A New Frontier for (Indoor) Location Research
物理分析:(室内)位置研究的新领域
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
R. Nandakumar;S. Rallapalli;Krishna Chintalapudi;V. Padmanabhan;L. Qiu;Aishwarya Ganesan;S. Guha;Deepanker Aggarwal;Aakash Goenka - 通讯作者:
Aakash Goenka
Aishwarya Ganesan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
面向in-storage智能计算的固态硬盘缓存管理优化
- 批准号:n/a
- 批准年份:2022
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
CAREER: Datacenter-Aware Local Storage Stacks
职业:数据中心感知的本地存储堆栈
- 批准号:
2340218 - 财政年份:2024
- 资助金额:
$ 69.96万 - 项目类别:
Continuing Grant
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
RGPIN-2017-04264 - 财政年份:2021
- 资助金额:
$ 69.96万 - 项目类别:
Discovery Grants Program - Individual
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
RGPIN-2017-04264 - 财政年份:2020
- 资助金额:
$ 69.96万 - 项目类别:
Discovery Grants Program - Individual
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
DGDND-2017-00073 - 财政年份:2019
- 资助金额:
$ 69.96万 - 项目类别:
DND/NSERC Discovery Grant Supplement
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
RGPIN-2017-04264 - 财政年份:2019
- 资助金额:
$ 69.96万 - 项目类别:
Discovery Grants Program - Individual
SHF: Small: Turning Visual Noise into Hardware Efficiency: Viewer-Aware Energy-Quality Adaptive Mobile Video Storage
SHF:小:将视觉噪声转化为硬件效率:观看者感知的能源质量自适应移动视频存储
- 批准号:
1815430 - 财政年份:2018
- 资助金额:
$ 69.96万 - 项目类别:
Standard Grant
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
DGDND-2017-00073 - 财政年份:2018
- 资助金额:
$ 69.96万 - 项目类别:
DND/NSERC Discovery Grant Supplement
SHF: Small: Turning Visual Noise into Hardware Efficiency: Viewer-Aware Energy-Quality Adaptive Mobile Video Storage
SHF:小:将视觉噪声转化为硬件效率:观看者感知的能源质量自适应移动视频存储
- 批准号:
1855706 - 财政年份:2018
- 资助金额:
$ 69.96万 - 项目类别:
Standard Grant
Deduplication-aware Systems for Cost-efficient Cloud Storage
用于经济高效的云存储的重复数据删除感知系统
- 批准号:
RGPIN-2017-04264 - 财政年份:2018
- 资助金额:
$ 69.96万 - 项目类别:
Discovery Grants Program - Individual
Research on Power-Aware Large-Scale Storage Systems Based on Highly Accurate Access Pattern Prediction
基于高精度访问模式预测的功耗感知大规模存储系统研究
- 批准号:
17H01718 - 财政年份:2017
- 资助金额:
$ 69.96万 - 项目类别:
Grant-in-Aid for Scientific Research (B)