CSR: Small: Lightning in Clouds: Detection and Characterization of Very Short Bottlenecks

CSR:小:云中闪电:极短瓶颈的检测和表征

基本信息

  • 批准号:
    1421561
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-10-01 至 2017-09-30
  • 项目状态:
    已结题

项目摘要

A plausible explanation for the persistent low utilization of data centers (around 18% by Gartner reports) is the managerial need to maintain quality of service against the well-known Latency Long Tail problem, where some apparently random requests that normally return within milliseconds would suddenly take multiple seconds. The latency long tail problem arises at moderate utilization levels (e.g., 50%) with all resources far from saturation. Despite the efforts to remedy the latency long tail problem in various ways, its causes have remained elusive: In most cases, the very requests that took several seconds actually return within milliseconds when executed by themselves. Studying and solving the latency long tail problem will contribute to better utilization while maintaining quality of service, leading to lower costs for cloud users, higher return on investment for cloud providers, and lower power consumption for the environment. The main goal of this project is the investigation of the class of very short bottlenecks, in which the CPU becomes saturated only for a small fraction of a second, as a significant cause of latency long tail problems. Despite their short lifespan, very short bottlenecks can lead to significant response time increases (several seconds) by propagating queuing effects up and down the request chain in an n-tier application system because of strong dependencies among the tiers during request processing. This project runs large scale experiments in clouds and simulators to generate extensive fine-grain monitoring data in the investigation of very short bottlenecks, which are virtually invisible under typical performance monitoring tools with sampling periods of seconds or minutes. To match the time scale of very short bottlenecks, special instrumentation software tools are being refined to sample intra-server resource utilization at millisecond resolution and timestamp inter-server messages at microsecond resolution. Preliminary studies of n-tier application benchmarks with naturally bursty workloads have found very short bottlenecks that cause latency long tail in several system layers: systems software (JVM garbage collection), processor architecture (dynamic voltage and frequency scaling), and consolidation of applications in virtualized cloud environments. They show the potential for many other sources of very short bottlenecks, e.g., kernel daemon processes that use 100% of CPU for several milliseconds. Through careful distributed event analysis of the experimental data, new kinds of very short bottlenecks can be discovered, verified, reproduced, and studied in detail. Concrete solutions for specific very short bottlenecks have been developed, e.g., an improved Java garbage collector. However, other very short bottlenecks have no specific bug-fixes, e.g., those created by consolidated workload overlapping bursts of statistical nature. As an alternative to bug-fixes, more general solutions that disrupt queuing propagation are being explored. As a concrete example, instead of using a classic request/response approach, where waiting threads participate in the queuing propagation, asynchronous requests with notification of responses to reduce overall queuing is being investigated as a potential solution to eliminate or reduce the impact of several kinds of very short bottlenecks.
对于数据中心的持续低利用率(Gartner报告约为18%),一个合理的解释是,管理人员需要保持服务质量,以应对众所周知的延迟长尾问题,即一些通常在几毫秒内返回的看似随机的请求突然需要几秒钟。延迟长尾问题出现在中等利用率水平(例如,50%),所有资源远未饱和的情况下。尽管人们努力以各种方式解决延迟长尾问题,但其原因仍然难以捉摸:在大多数情况下,花费几秒钟的请求在自己执行时实际上在几毫秒内返回。研究和解决延迟长尾问题将有助于在保持服务质量的同时更好地利用,从而降低云用户的成本,提高云提供商的投资回报,降低环境的功耗。这个项目的主要目标是研究一类非常短的瓶颈,在这种瓶颈中,CPU只在不到一秒的时间内饱和,这是延迟长尾问题的一个重要原因。尽管它们的生命周期很短,但由于在请求处理期间各层之间存在很强的依赖性,非常短的瓶颈会在n层应用程序系统的请求链中上下传播排队效应,从而导致显著的响应时间增加(几秒钟)。该项目在云和模拟器中进行大规模实验,以在非常短的瓶颈调查中生成广泛的细粒度监控数据,这些瓶颈在典型的性能监控工具下几乎是不可见的,采样周期为几秒或几分钟。为了匹配非常短的瓶颈的时间尺度,正在改进特殊的仪器软件工具,以毫秒分辨率采样服务器内部的资源利用率,并以微秒分辨率采样服务器间的时间戳消息。对具有自然突发工作负载的n层应用程序基准测试的初步研究发现,在几个系统层(系统软件(JVM垃圾收集)、处理器架构(动态电压和频率缩放)以及虚拟化云环境中的应用程序整合)中,非常短的瓶颈会导致延迟长尾。它们显示了许多其他非常短的瓶颈来源的潜在可能性,例如,内核守护进程在几毫秒内使用100%的CPU。通过对实验数据进行仔细的分布式事件分析,可以发现、验证、再现和详细研究新的极短瓶颈。针对特定的非常短的瓶颈,已经开发了具体的解决方案,例如,改进的Java垃圾收集器。然而,其他非常短的瓶颈没有特定的错误修复,例如,那些由合并的工作负载重叠统计性质的突发事件造成的瓶颈。作为错误修复的替代方案,人们正在探索破坏排队传播的更通用的解决方案。作为一个具体的例子,与使用经典的请求/响应方法(等待线程参与队列传播)不同,正在研究使用带有响应通知的异步请求来减少总体队列,以作为消除或减少几种非常短的瓶颈影响的潜在解决方案。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Calton Pu其他文献

Editorial for CollaborateCom 2011 Special Issue
  • DOI:
    10.1007/s11036-013-0436-0
  • 发表时间:
    2013-02-28
  • 期刊:
  • 影响因子:
    2.000
  • 作者:
    James Caverlee;Calton Pu;Dimitrios Georgakopoulos;James Joshi
  • 通讯作者:
    James Joshi
A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems
  • DOI:
    10.1186/1471-2164-12-s4-s13
  • 发表时间:
    2011-01-01
  • 期刊:
  • 影响因子:
    3.700
  • 作者:
    Luciano V Araújo;Simon Malkowski;Kelly R Braghetto;Maria R Passos-Bueno;Mayana Zatz;Calton Pu;João E Ferreira
  • 通讯作者:
    João E Ferreira
Buffer overflows: attacks and defenses for the vulnerability of the decade
缓冲区溢出:十年来漏洞的攻击与防御
Editorial: Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2012)
  • DOI:
    10.1007/s11036-014-0532-9
  • 发表时间:
    2014-09-16
  • 期刊:
  • 影响因子:
    2.000
  • 作者:
    Lakshmish Ramaswamy;Barbara Carminati;James Joshi;Calton Pu
  • 通讯作者:
    Calton Pu
JTangCSB: A Cloud Service Bus for Cloud and Enterprise Application Integration
JTangCSB:用于云和企业应用集成的云服务总线
  • DOI:
    10.1109/mic.2014.62
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xingjian Lu;Calton Pu;Zhaohui Wu;Hanwei Chen
  • 通讯作者:
    Hanwei Chen

Calton Pu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Calton Pu', 18)}}的其他基金

RAPID: Tracking and Evaluation of the Coronavirus (COVID-19) Epidemic Propagation by Finding and Maintaining Live Knowledge in Social Media
RAPID:通过在社交媒体中查找和维护实时知识来跟踪和评估冠状病毒(COVID-19)的流行传播
  • 批准号:
    2026945
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
EAGER: Live Reality: Sustainable and Up-to-Date Information Quality in Live Social Media through Continuous Evidence-Based Knowledge Acquisition
EAGER:实时现实:通过持续的循证知识获取,实时社交媒体中可持续且最新的信息质量
  • 批准号:
    2039653
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
HNDS-I: Collaborative Research: Developing a Data Platform for Analysis of Nonprofit Organizations
HNDS-I:协作研究:开发用于分析非营利组织的数据平台
  • 批准号:
    2024320
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
1st US-Japan Workshop Enabling Global Collaborations in Big Data Research; June, 2017, Atlanta, GA
第一届美日研讨会促进大数据研究的全球合作;
  • 批准号:
    1741034
  • 财政年份:
    2017
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
RCN: SAVI: Adaptive Management and Use of Resilient Infrastructures in Smart Cities: Support for Global Collaborative Research on Real-Time Analytics of Heterogeneous Big Data
RCN:SAVI:智慧城市弹性基础设施的适应性管理和使用:支持异构大数据实时分析的全球协作研究
  • 批准号:
    1550379
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
EAGER: An Exploratory Study of Multi-Hazard Management through Multi-Source Integration of Physical and Social Sensors
EAGER:通过物理和社会传感器的多源集成进行多危害管理的探索性研究
  • 批准号:
    1402266
  • 财政年份:
    2014
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
SAVI: EAGER: for Global Research on Applying Information Technology to Support Effective Disaster Management (GRAIT-DM)
SAVI:EAGER:应用信息技术支持有效灾害管理的全球研究 (GRAIT-DM)
  • 批准号:
    1250260
  • 财政年份:
    2012
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
RAPID: Automating Emergency Data and Metadata Management to Support Effective Short Term and Long Term Disaster Recovery Efforts
RAPID:自动化应急数据和元数据管理,支持有效的短期和长期灾难恢复工作
  • 批准号:
    1138666
  • 财政年份:
    2011
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CSR:Small: Multi-Bottlenecks: What They Are and How to Find Them
CSR:小:多瓶颈:它们是什么以及如何找到它们
  • 批准号:
    1116451
  • 财政年份:
    2011
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
II-NEW: Collaborative Research: Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons)
II-新:协作研究:垃圾邮件处理、归档和监控社区设施 (SPAM Commons)
  • 批准号:
    0855180
  • 财政年份:
    2009
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

CSR: Small: Leveraging Physical Side-Channels for Good
CSR:小:利用物理侧通道做好事
  • 批准号:
    2312089
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
NeTS: Small: NSF-DST: Modernizing Underground Mining Operations with Millimeter-Wave Imaging and Networking
NeTS:小型:NSF-DST:利用毫米波成像和网络实现地下采矿作业现代化
  • 批准号:
    2342833
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CPS: Small: NSF-DST: Autonomous Operations of Multi-UAV Uncrewed Aerial Systems using Onboard Sensing to Monitor and Track Natural Disaster Events
CPS:小型:NSF-DST:使用机载传感监测和跟踪自然灾害事件的多无人机无人航空系统自主操作
  • 批准号:
    2343062
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: FET: Small: Reservoir Computing with Ion-Channel-Based Memristors
合作研究:FET:小型:基于离子通道忆阻器的储层计算
  • 批准号:
    2403559
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
オミックス解析を用いたブドウ球菌 small colony variants の包括的特徴づけ
使用组学分析全面表征葡萄球菌小菌落变体
  • 批准号:
    24K13443
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
AF: Small: Problems in Algorithmic Game Theory for Online Markets
AF:小:在线市场的算法博弈论问题
  • 批准号:
    2332922
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: FET: Small: Algorithmic Self-Assembly with Crisscross Slats
合作研究:FET:小型:十字交叉板条的算法自组装
  • 批准号:
    2329908
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
NeTS: Small: ML-Driven Online Traffic Analysis at Multi-Terabit Line Rates
NeTS:小型:ML 驱动的多太比特线路速率在线流量分析
  • 批准号:
    2331111
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
  • 批准号:
    2331302
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
  • 批准号:
    2331301
  • 财政年份:
    2024
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了