CSR: Medium:Collaborative Research:Holistic, Cross-Site, Hybrid System Anomaly Debugging for Large Scale Hosting Infrastructures

CSR:中:协作研究:大规模托管基础设施的整体、跨站点、混合系统异常调试

基本信息

  • 批准号:
    1514256
  • 负责人:
  • 金额:
    $ 28.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-08-01 至 2020-07-31
  • 项目状态:
    已结题

项目摘要

Large-scale shared hosting infrastructures such as multi-tenant cloud computing systems have become increasingly popular by allowing users to lease resources on-demand in a cost-effective way. As multiple tenants may share computing resources, hosting infrastructures are complex systems and prone to various system anomalies. Although software developers often perform rigorous offline testing, many subtle bugs only manifest themselves during large-scale production run. Many anomalies such as those where the system does not crash but fails to behave as expected are hard to reproduce and diagnose using existing techniques. Existing system anomaly diagnosis work can be broadly classified into two categories: 1) the black-box schemes which do not require source code and are suitable for online production-site diagnosis, and 2) the white-box schemes which require source code and expensive code instrumentation and are suitable for development site, offline diagnosis. Although white-box schemes provide fine-grained diagnosis, large-scale production hosting infrastructures are reluctant to adopt them due to their high-overhead and intrusive system recording approaches.The overarching objective of this project is to explore an innovative cross-site system anomaly debugging approach that intelligently integrates production-site black-box diagnosis with development-site white-box debugging into a more powerful hosting infrastructure debugging framework. This project will develop techniques for development-site, offline white-box debugging that takes the production-site fault inference results as guidance to find the exact anomaly causes. The project will focus on diagnosing non-crashing system anomalies (e.g., performance degradation, service outage, software hang, unexpected halt) that are common in real world hosting infrastructures but are difficult to debug using existing techniques. Techniques developed in this project will generate significant impact on improving the robustness of real world hosting infrastructures. The PIs will develop new course modules on the hosting infrastructure debugging for both graduate and undergraduate classes they regularly teaches. This project will develop programming courseware based on the research prototypes developed in this project. The PIs will use their power of role model and a set of outreach activities to recruit more female students to pursue systems research. The PIs will disseminate their results and collected data broadly through publication and technology transfer. Developed software artifacts and experimental datasets will be released for public use.
诸如多租户云计算系统之类的大规模共享托管基础设施通过允许用户以具有成本效益的方式按需租赁资源而变得越来越流行。由于多个租户可以共享计算资源,托管基础设施是复杂的系统,并且容易出现各种系统异常。尽管软件开发人员经常执行严格的离线测试,但许多细微的错误只在大规模生产运行期间才会显现出来。许多异常,如系统没有崩溃,但未能按照预期的行为是很难重现和诊断使用现有的技术。现有的系统异常诊断工作大致可分为两类:1)不需要源代码、适合生产现场在线诊断的黑盒方案; 2)需要源代码和昂贵的代码插装、适合开发现场、离线诊断的白盒方案。虽然白盒方案提供细粒度的诊断,大规模的生产托管基础设施不愿意采用它们,因为它们的开销很高,而且会侵入系统记录方法。本项目的总体目标是探索一种创新的跨站点系统异常调试方法,该方法智能地集成了生产站点的黑盒诊断和开发站点的白色诊断-框调试到一个更强大的托管基础设施调试框架。本项目将开发开发现场离线白盒调试技术,以生产现场故障推断结果为指导,找到确切的异常原因。该项目将侧重于诊断非崩溃系统异常(例如,性能降级、服务中断、软件挂起、意外停机),这些在真实的世界主机基础结构中是常见的,但是难以使用现有技术进行调试。本项目开发的技术将对提高真实的世界托管基础设施的健壮性产生重大影响。PI将为他们定期教授的研究生和本科生课程开发关于托管基础设施调试的新课程模块。本计画将以本计画所开发之研究原型为基础,开发程式设计课件。研究员将利用他们的榜样力量和一系列外展活动,招募更多女学生从事系统研究。参与者将通过出版物和技术转让广泛传播其结果和收集的数据。开发的软件工件和实验数据集将被发布供公众使用。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Shan Lu其他文献

The Research of Enterprise Informatization Upgrade Investment Resource Allocation
企业信息化升级投资资源配置研究
Design of a sector bowtie nano-rectenna for optical power and infrared detection
用于光功率和红外检测的扇形领结纳米整流天线的设计
  • DOI:
    10.1007/s11467-015-0508-7
  • 发表时间:
    2015-10
  • 期刊:
  • 影响因子:
    7.5
  • 作者:
    Kai Wang;Haifeng Hu;Shan Lu;Lingju Guo;Tao He
  • 通讯作者:
    Tao He
Microbacterium chengjingii sp. nov. and Microbacterium fandaimingii sp. nov., isolated from bat faeces of Hipposideros and Rousettus species.
城津微杆菌
Generalized construction of signature code for multiple-access adder channel
多路访问加法器通道签名代码的广义构造
Decoding for non-binary signature code
非二进制签名代码的解码

Shan Lu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Shan Lu', 18)}}的其他基金

CSR: Medium: Improving the Interface between Machine Learning and Software Systems
CSR:中:改进机器学习和软件系统之间的接口
  • 批准号:
    2313190
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant
NSF Student Travel Grant for 2020 ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
NSF 学生旅费资助 2020 年 ACM 国际编程语言和操作系统架构支持会议 (ASPLOS)
  • 批准号:
    1936025
  • 财政年份:
    2020
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant
CNS Core: Medium: Accurate Anytime Learning for Energy andTimeliness in Software Systems
CNS 核心:中:随时准确学习软件系统的能量和及时性
  • 批准号:
    1956180
  • 财政年份:
    2020
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Student Travel Support for 2016 USENIX Annual Technical Conference
2016 年 USENIX 年度技术会议的学生旅行支持
  • 批准号:
    1632170
  • 财政年份:
    2016
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
  • 批准号:
    1546543
  • 财政年份:
    2015
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant
CAREER: Combating Performance Bugs in Software Systems
职业:对抗软件系统中的性能错误
  • 批准号:
    1514189
  • 财政年份:
    2014
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
XPS: FULL: CCA: Production-Run Failure Recovery Based Approach to Reliable Parallel Software
XPS:完整:CCA:基于生产运行故障恢复的可靠并行软件方法
  • 批准号:
    1439091
  • 财政年份:
    2014
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant
CAREER: Combating Performance Bugs in Software Systems
职业:对抗软件系统中的性能错误
  • 批准号:
    1054616
  • 财政年份:
    2011
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Fighting Concurrency Bugs through Effect-Oriented Approaches
通过面向效果的方法对抗并发错误
  • 批准号:
    1018180
  • 财政年份:
    2010
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: CSR: Medium: Scaling Secure Serverless Computing on Heterogeneous Datacenters
协作研究:CSR:中:在异构数据中心上扩展安全无服务器计算
  • 批准号:
    2312206
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Architecting GPUs for Practical Homomorphic Encryption-based Computing
协作研究:CSR:中:为实用的同态加密计算构建 GPU
  • 批准号:
    2312276
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2312689
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2401244
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Scaling Secure Serverless Computing on Heterogeneous Datacenters
协作研究:CSR:中:在异构数据中心上扩展安全无服务器计算
  • 批准号:
    2312207
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Adaptive Environmental Awareness for Collaborative Augmented Reality
协作研究:企业社会责任:媒介:协作增强现实的自适应环境意识
  • 批准号:
    2312760
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Core: Medium: Scaling Unix/Linux Shell Programs
协作研究:CSR:核心:中:扩展 Unix/Linux Shell 程序
  • 批准号:
    2312346
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: MemDrive: Memory-Driven Full-Stack Collaboration for Autonomous Embedded Systems
协作研究:CSR:媒介:MemDrive:自主嵌入式系统的内存驱动全栈协作
  • 批准号:
    2312397
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: MemDrive: Memory-Driven Full-Stack Collaboration for Autonomous Embedded Systems
协作研究:CSR:媒介:MemDrive:自主嵌入式系统的内存驱动全栈协作
  • 批准号:
    2312396
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
Collaborative Research: CSR: Medium: Adaptive Environmental Awareness for Collaborative Augmented Reality
协作研究:企业社会责任:媒介:协作增强现实的自适应环境意识
  • 批准号:
    2312761
  • 财政年份:
    2023
  • 资助金额:
    $ 28.2万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了