CSR: Medium:Collaborative Research:Holistic, Cross-Site, Hybrid System Anomaly Debugging for Large Scale Hosting Infrastructures
CSR:中:协作研究:大规模托管基础设施的整体、跨站点、混合系统异常调试
基本信息
- 批准号:1513942
- 负责人:
- 金额:$ 51.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-08-01 至 2021-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Large-scale shared hosting infrastructures such as multi-tenant cloud computing systems have become increasingly popular by allowing users to lease resources on-demand in a cost-effective way. As multiple tenants may share computing resources, hosting infrastructures are complex systems and prone to various system anomalies. Although software developers often perform rigorous offline testing, many subtle bugs only manifest themselves during large-scale production run. Many anomalies such as those where the system does not crash but fails to behave as expected are hard to reproduce and diagnose using existing techniques. Existing system anomaly diagnosis work can be broadly classified into two categories: 1) the black-box schemes which do not require source code and are suitable for online production-site diagnosis, and 2) the white-box schemes which require source code and expensive code instrumentation and are suitable for development site, offline diagnosis. Although white-box schemes provide fine-grained diagnosis, large-scale production hosting infrastructures are reluctant to adopt them due to their high-overhead and intrusive system recording approaches.The overarching objective of this project is to explore an innovative cross-site system anomaly debugging approach that intelligently integrates production-site black-box diagnosis with development-site white-box debugging into a more powerful hosting infrastructure debugging framework. This project will develop techniques for development-site, offline white-box debugging that takes the production-site fault inference results as guidance to find the exact anomaly causes. The project will focus on diagnosing non-crashing system anomalies (e.g., performance degradation, service outage, software hang, unexpected halt) that are common in real world hosting infrastructures but are difficult to debug using existing techniques. Techniques developed in this project will generate significant impact on improving the robustness of real world hosting infrastructures. The PIs will develop new course modules on the hosting infrastructure debugging for both graduate and undergraduate classes they regularly teaches. This project will develop programming courseware based on the research prototypes developed in this project. The PIs will use their power of role model and a set of outreach activities to recruit more female students to pursue systems research. The PIs will disseminate their results and collected data broadly through publication and technology transfer. Developed software artifacts and experimental datasets will be released for public use.
大规模共享托管基础设施(例如多租户云计算系统)允许用户以经济有效的方式按需租赁资源,因此变得越来越流行。由于多个租户可能共享计算资源,因此托管基础设施是复杂的系统,并且容易出现各种系统异常。尽管软件开发人员经常进行严格的离线测试,但许多细微的错误只有在大规模生产运行时才会显现出来。许多异常情况(例如系统未崩溃但未能按预期运行)很难使用现有技术来重现和诊断。现有的系统异常诊断工作大致可以分为两类:1)黑盒方案,不需要源代码,适合生产现场在线诊断;2)白盒方案,需要源代码和昂贵的代码插装,适合开发现场、离线诊断。尽管白盒方案提供了细粒度的诊断,但由于其高开销和侵入性系统记录方法,大规模生产托管基础设施不愿意采用它们。该项目的首要目标是探索一种创新的跨站点系统异常调试方法,将生产站点黑盒诊断与开发站点白盒调试智能地集成到更强大的托管基础设施调试框架中。该项目将开发开发现场离线白盒调试技术,以生产现场故障推断结果为指导,找到准确的异常原因。该项目将专注于诊断非崩溃系统异常(例如性能下降、服务中断、软件挂起、意外停止),这些异常在现实世界的托管基础设施中很常见,但很难使用现有技术进行调试。该项目开发的技术将对提高现实世界托管基础设施的稳健性产生重大影响。 PI 将为他们定期教授的研究生和本科生课程开发有关托管基础设施调试的新课程模块。本项目将根据本项目开发的研究原型开发编程课件。 PI 将利用她们的榜样力量和一系列外展活动来招募更多女学生从事系统研究。 PI 将通过出版物和技术转让广泛传播其结果和收集的数据。开发的软件工件和实验数据集将发布供公众使用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Gu其他文献
An Image Enhancement Method Based on Partial Differential Equations to Improve Dark Channel Theory
一种基于偏微分方程改进暗通道理论的图像增强方法
- DOI:
10.1088/1755-1315/769/4/042112 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Pengcheng Li;Xiaohui Gu - 通讯作者:
Xiaohui Gu
Adaptive data-driven service integrity attestation for multi-tenant cloud systems
多租户云系统的自适应数据驱动服务完整性证明
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Juan Du;Xiaohui Gu;Nidhi Shah - 通讯作者:
Nidhi Shah
BridgeNet: An Adaptive Multi-Source Stream Dissemination Service Overlay
BridgeNet:自适应多源流传播服务覆盖
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:0
- 作者:
Xiaohui Gu;Zhen Wen;Philip S. Yu - 通讯作者:
Philip S. Yu
You can be more trustworthy: A feature fusion reinforcement network for credible anti-noise fault diagnosis
你可以更值得信赖:一种用于可靠抗噪故障诊断的特征融合强化网络
- DOI:
10.1016/j.aei.2024.103056 - 发表时间:
2025-03-01 - 期刊:
- 影响因子:9.900
- 作者:
Yuan Wei;Hongchong Peng;Mansong Rong;Xiaohui Gu;Xiangyan Chen - 通讯作者:
Xiangyan Chen
A Retrospective Respiratory Gating System Based on Epipolar Consistency Conditions
基于极一致性条件的回顾性呼吸门控系统
- DOI:
10.32604/mcb.2019.07383 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Maosen Lian;Yi Li;Xiaohui Gu;Shouhua Luo - 通讯作者:
Shouhua Luo
Xiaohui Gu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Gu', 18)}}的其他基金
CAREER: Enable Robust Virtualized Hosting Infrastructures via Coordinated Learning, Recovery, and Diagnosis
职业:通过协调学习、恢复和诊断实现强大的虚拟化托管基础设施
- 批准号:
1149445 - 财政年份:2012
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
CSR:Small: Online System Anomaly Prediction and Diagnosis for Large-Scale Hosting Infrastructures
CSR:Small:大规模托管基础设施的在线系统异常预测与诊断
- 批准号:
0915567 - 财政年份:2009
- 资助金额:
$ 51.8万 - 项目类别:
Standard Grant
CSR: Small: Collaborative Research: Hybrid Opportunistic Computing for Green Clouds
CSR:小型:协作研究:绿色云的混合机会计算
- 批准号:
0915861 - 财政年份:2009
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
相似海外基金
Collaborative Research: CSR: Medium: Scaling Secure Serverless Computing on Heterogeneous Datacenters
协作研究:CSR:中:在异构数据中心上扩展安全无服务器计算
- 批准号:
2312206 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Architecting GPUs for Practical Homomorphic Encryption-based Computing
协作研究:CSR:中:为实用的同态加密计算构建 GPU
- 批准号:
2312276 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2312689 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2401244 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Scaling Secure Serverless Computing on Heterogeneous Datacenters
协作研究:CSR:中:在异构数据中心上扩展安全无服务器计算
- 批准号:
2312207 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Adaptive Environmental Awareness for Collaborative Augmented Reality
协作研究:企业社会责任:媒介:协作增强现实的自适应环境意识
- 批准号:
2312760 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Core: Medium: Scaling Unix/Linux Shell Programs
协作研究:CSR:核心:中:扩展 Unix/Linux Shell 程序
- 批准号:
2312346 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: MemDrive: Memory-Driven Full-Stack Collaboration for Autonomous Embedded Systems
协作研究:CSR:媒介:MemDrive:自主嵌入式系统的内存驱动全栈协作
- 批准号:
2312397 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: MemDrive: Memory-Driven Full-Stack Collaboration for Autonomous Embedded Systems
协作研究:CSR:媒介:MemDrive:自主嵌入式系统的内存驱动全栈协作
- 批准号:
2312396 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Adaptive Environmental Awareness for Collaborative Augmented Reality
协作研究:企业社会责任:媒介:协作增强现实的自适应环境意识
- 批准号:
2312761 - 财政年份:2023
- 资助金额:
$ 51.8万 - 项目类别:
Continuing Grant