CRII: CSR: Towards Pinpointing the Root Causes of Failures in Flash-based Storage Systems
CRII:CSR:找出基于闪存的存储系统故障的根本原因
基本信息
- 批准号:1855565
- 负责人:
- 金额:$ 5.95万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-11 至 2020-04-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Storage systems that hold financial transactions, scientific computation results, a family's photos, and more, are undergoing dramatic changes. Triggered by the advances in flash memory technology, almost every layer of storage systems is being optimized aggressively to make full use of flash. This trend increases the complexity of the systems and increases the chance of obscure failures. Failures in flash-based storage systems are challenging to diagnose even for data center professionals, and can lead to severe downtime or even data loss. Moreover, the difficulty of failure diagnosis will increase in the foreseeable future with the rapid growth of flash and other non-volatile memory technologies. Thus, a framework for diagnosing the whole system and pinpointing the root causes of failures is urgently needed.The goal of this research project is to enable precise, end-to-end diagnosis of failures in flash-based storage systems. One key observation is that more and more kernel components are being optimized aggressively for flash, so a diagnosis framework is needed that minimizes the dependence on the kernel. On the other hand, although many components are changing, some fundamental abstractions such as files and logical blocks, and interfaces such as system calls and command sets, are relatively stable. A diagnosis framework can focus on the invariants to analyze the behavior of the rapidly evolving systems. Based on these observations, this research project investigates a novel failure diagnosis framework that separates failure domains based on the interfaces and automatically correlates the fundamental operations across layers. More specifically, the framework consists of two synergistic components: (1) an on-drive device agent to record and reproduce the host-device interactions without any dependence on the host, and (2) an end-to-end analyzer to provide in-depth diagnosis support along the whole data path from the application to the device.This research project will advance the dependability of storage systems for various important data in modern society. The improvements to data storage will in turn benefit the society from many other aspects, including examples such as reducing maintenance costs and human efforts and avoiding financial loss due to service downtime or data corruption. This research project will also contribute to the curriculum of operating systems and other related courses, and will engage undergraduate and graduate students in systems research. In addition, the project will be integrated into the Alliance for Minority Participation Program and the Young Women in Computing Program to engage students from traditionally underrepresented groups, serving our national, regional, and local interest in increasing diversity in computing research.
存储金融交易、科学计算结果、家庭照片等的存储系统正在发生巨大变化。在闪存技术进步的推动下,几乎每一层存储系统都在积极优化,以充分利用闪存。这种趋势增加了系统的复杂性,并增加了出现模糊故障的机会。即使对于数据中心的专业人员来说,基于闪存的存储系统的故障诊断也是一项挑战,并且可能导致严重的停机甚至数据丢失。此外,在可预见的未来,随着闪存和其他非易失性存储技术的快速发展,故障诊断的难度将会增加。因此,迫切需要一个诊断整个系统并确定故障根源的框架。这个研究项目的目标是在基于闪存的存储系统中实现精确的端到端故障诊断。一个关键的观察是,越来越多的内核组件正在针对flash进行积极的优化,因此需要一个诊断框架来最大限度地减少对内核的依赖。另一方面,尽管许多组件都在变化,但一些基本的抽象(如文件和逻辑块)以及接口(如系统调用和命令集)相对稳定。诊断框架可以关注不变量来分析快速演化系统的行为。基于这些观察,本研究项目研究了一种新的故障诊断框架,该框架基于接口分离故障域,并自动关联各层的基本操作。更具体地说,该框架由两个协同组件组成:(1)驱动器上的设备代理,用于记录和重现主机-设备交互,而不依赖于主机;(2)端到端分析器,用于在从应用程序到设备的整个数据路径上提供深入的诊断支持。该研究项目将提高现代社会各种重要数据存储系统的可靠性。数据存储的改进将反过来从许多其他方面造福社会,包括减少维护成本和人力,避免由于服务停机或数据损坏而造成的经济损失。该研究项目还将有助于操作系统和其他相关课程的课程设置,并将吸引本科生和研究生参与系统研究。此外,该项目将被整合到少数民族参与联盟计划和年轻女性计算机计划中,以吸引传统上代表性不足的群体的学生,为我们国家、地区和地方在增加计算研究多样性方面的利益服务。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A study of persistent memory bugs in the Linux kernel
- DOI:10.1145/3456727.3463783
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Duo Zhang;Om Rameshwar Gatla;Wei Xu;Mai Zheng
- 通讯作者:Duo Zhang;Om Rameshwar Gatla;Wei Xu;Mai Zheng
Analyzing Configuration Dependencies of DAX File Systems
分析 DAX 文件系统的配置依赖性
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Mahmud, Tabassum;Gatla, Om R.;Zhang, Duo;Love, Carson;Bumann, Ryan;Zheng, Mai
- 通讯作者:Zheng, Mai
Towards Robust File System Checkers
- DOI:10.1145/3281031
- 发表时间:2018-12
- 期刊:
- 影响因子:0
- 作者:Om Rameshwar Gatla;Muhammad Hameed;Mai Zheng;Viacheslav Dubeyko;A. Manzanares;F. Blagojevic;Cyril Guyot;R. Mateescu
- 通讯作者:Om Rameshwar Gatla;Muhammad Hameed;Mai Zheng;Viacheslav Dubeyko;A. Manzanares;F. Blagojevic;Cyril Guyot;R. Mateescu
Understanding Persistent-memory-related Issues in the Linux Kernel
- DOI:10.1145/3605946
- 发表时间:2023-07
- 期刊:
- 影响因子:1.7
- 作者:Om Rameshwar Gatla;Duo Zhang;Wei Xu;Mai Zheng
- 通讯作者:Om Rameshwar Gatla;Duo Zhang;Wei Xu;Mai Zheng
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mai Zheng其他文献
On Failure Diagnosis of the Storage Stack
浅谈存储堆栈的故障诊断
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Duo Zhang;Om Rameshwar Gatla;Runzhou Han;Mai Zheng - 通讯作者:
Mai Zheng
Performance and mechanism of ammonia production by electrocatalytic nitrate reduction based on dodecahydro-emcloso/em-dodecaborate hybrid
基于十二氢-内包-十二硼酸盐杂化体的电催化硝酸盐还原制氨的性能与机制
- DOI:
10.1016/j.jcis.2023.08.132 - 发表时间:
2023-12-15 - 期刊:
- 影响因子:9.700
- 作者:
Jiajia Wang;Xuefan Deng;Haixu Zhao;Xun Liu;Mai Zheng;Zan Jiang;Long Zhang;Haibo Zhang - 通讯作者:
Haibo Zhang
Strong metal-support interactions for high sintering resistance of Ru-based catalysts toward the HER and ORR
用于 Ru 基催化剂对 HER 和 ORR 具有高抗烧结性的强金属-载体相互作用
- DOI:
10.1039/d3cc02529b - 发表时间:
2023-08-22 - 期刊:
- 影响因子:4.200
- 作者:
Xuzhuo Sun;Baofan Wu;Bo Li;Jiashou Zhao;Shanshan Li;Mai Zheng;Jing Chen;Haibo Zhang - 通讯作者:
Haibo Zhang
A command-level study of Linux kernel bugs
Linux 内核 bug 的命令级研究
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Yiliang Shi;Danny Murillo;Simeng Wang;Jinrui Cao;Mai Zheng - 通讯作者:
Mai Zheng
A Cross-Layer Approach for Diagnosing Storage System Failures
诊断存储系统故障的跨层方法
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Duo Zhang;C. Gupta;Mai Zheng;A. Manzanares;F. Blagojevic;Cyril Guyot - 通讯作者:
Cyril Guyot
Mai Zheng的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mai Zheng', 18)}}的其他基金
CAREER: Towards Full-Stack Crash Consistency
职业生涯:实现全栈崩溃一致性
- 批准号:
1943204 - 财政年份:2020
- 资助金额:
$ 5.95万 - 项目类别:
Continuing Grant
SHF: Small: Collaborative Research: A Parallel Graph-Based Paradigm for HPC Parallel File System Checkers
SHF:小型:协作研究:基于并行图的 HPC 并行文件系统检查器范例
- 批准号:
1910747 - 财政年份:2019
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
SHF: Small: Collaborative Research: Uncovering Vulnerabilities in Parallel File Systems for Reliable High Performance Computing
SHF:小型:协作研究:发现并行文件系统中的漏洞以实现可靠的高性能计算
- 批准号:
1853714 - 财政年份:2018
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
SHF: Small: Collaborative Research: Uncovering Vulnerabilities in Parallel File Systems for Reliable High Performance Computing
SHF:小型:协作研究:发现并行文件系统中的漏洞以实现可靠的高性能计算
- 批准号:
1717630 - 财政年份:2017
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
CRII: CSR: Towards Pinpointing the Root Causes of Failures in Flash-based Storage Systems
CRII:CSR:找出基于闪存的存储系统故障的根本原因
- 批准号:
1566554 - 财政年份:2016
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
相似国自然基金
基于经筋理论的筋针与整脊联合疗法治疗 CSR疼痛的临床应用研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
RAC2(G15D)突变参与B细胞 Ig-CSR过程的分子机制研究
- 批准号:2025JJ80630
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于CRISPR/CasRx调控CSR1基因表达预防氨基糖甙类耳毒性聋研究
- 批准号:2024Y9183
- 批准年份:2024
- 资助金额:25.0 万元
- 项目类别:省市级项目
基于Piezo机械敏感通道探讨奉伸松调法调控颈肌细胞自噬与DRG痛觉感受神经元可塑性治疗CSR的作用机制
- 批准号:
- 批准年份:2024
- 资助金额:0 万元
- 项目类别:地区科学基金项目
准社会互动视角下CSR数字化沟通对品牌绩效的差异化影响、机制与管理对策
- 批准号:72362008
- 批准年份:2023
- 资助金额:28 万元
- 项目类别:地区科学基金项目
善行得善果?后疫情时代嵌入式和边缘式CSR对员工幸福感的跨层影响研究
- 批准号:72102183
- 批准年份:2021
- 资助金额:24.00 万元
- 项目类别:青年科学基金项目
善行得善果?后疫情时代嵌入式和边缘式CSR对员工幸福感的跨层影响研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:
基于脊髓突触可塑性探讨“调气”电针远端腧穴干预CSR模型大鼠的中枢镇痛效应及机制研究
- 批准号:82160934
- 批准年份:2021
- 资助金额:34 万元
- 项目类别:地区科学基金项目
利用输运模型和机器学习方法研究CSR能区的低温高密核物质
- 批准号:U2032145
- 批准年份:2020
- 资助金额:50.0 万元
- 项目类别:联合基金项目
PPR家族蛋白CSR3调控拟南芥叶绿体RNA剪接的分子机理
- 批准号:32000184
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
CRII: CSR: Towards an Edge-enabled Software-Defined Vehicle Framework for Dynamic Over-the-Air Updates
CRII:CSR:迈向支持边缘的软件定义车辆框架,用于动态无线更新
- 批准号:
2348151 - 财政年份:2024
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
Collaborative Research: CSR: Medium: Towards A Unified Memory-centric Computing System with Cross-layer Support
协作研究:CSR:中:迈向具有跨层支持的统一的以内存为中心的计算系统
- 批准号:
2310422 - 财政年份:2023
- 资助金额:
$ 5.95万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Towards A Unified Memory-centric Computing System with Cross-layer Support
协作研究:CSR:中:迈向具有跨层支持的统一的以内存为中心的计算系统
- 批准号:
2310423 - 财政年份:2023
- 资助金额:
$ 5.95万 - 项目类别:
Continuing Grant
CSR: Small: Decoupling File System from Volatile Main Memory: A First Step towards a Single-Level Persistent Store
CSR:小:将文件系统与易失性主内存解耦:迈向单级持久存储的第一步
- 批准号:
1813485 - 财政年份:2018
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
CSR: Small: Towards Programming Datacenters
CSR:小型:迈向数据中心编程
- 批准号:
1817116 - 财政年份:2018
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
CSR: Small: Towards Efficient Deep Inference for Mobile Applications
CSR:小:迈向移动应用程序的高效深度推理
- 批准号:
1815619 - 财政年份:2018
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
CSR: Medium: Collaborative Research: Towards Finer-grained Cloud Computing
CSR:媒介:协作研究:迈向更细粒度的云计算
- 批准号:
1819109 - 财政年份:2017
- 资助金额:
$ 5.95万 - 项目类别:
Continuing Grant
CRII: CSR: Towards Pinpointing the Root Causes of Failures in Flash-based Storage Systems
CRII:CSR:找出基于闪存的存储系统故障的根本原因
- 批准号:
1566554 - 财政年份:2016
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
NeTS: CSR: Small: Towards a Redundancy Aware Network Stack
NeTS:CSR:小型:迈向冗余感知网络堆栈
- 批准号:
1618321 - 财政年份:2016
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant
CRII: CSR: Towards Understanding and Mitigating the Impact of Web Robot Traffic on Web Systems
CRII:企业社会责任:了解并减轻网络机器人流量对网络系统的影响
- 批准号:
1464104 - 财政年份:2015
- 资助金额:
$ 5.95万 - 项目类别:
Standard Grant