CSR: Small: Lightning in Clouds: Detection and Characterization of Very Short Bottlenecks
CSR:小:云中闪电:极短瓶颈的检测和表征
基本信息
- 批准号:1421561
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-10-01 至 2017-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A plausible explanation for the persistent low utilization of data centers (around 18% by Gartner reports) is the managerial need to maintain quality of service against the well-known Latency Long Tail problem, where some apparently random requests that normally return within milliseconds would suddenly take multiple seconds. The latency long tail problem arises at moderate utilization levels (e.g., 50%) with all resources far from saturation. Despite the efforts to remedy the latency long tail problem in various ways, its causes have remained elusive: In most cases, the very requests that took several seconds actually return within milliseconds when executed by themselves. Studying and solving the latency long tail problem will contribute to better utilization while maintaining quality of service, leading to lower costs for cloud users, higher return on investment for cloud providers, and lower power consumption for the environment. The main goal of this project is the investigation of the class of very short bottlenecks, in which the CPU becomes saturated only for a small fraction of a second, as a significant cause of latency long tail problems. Despite their short lifespan, very short bottlenecks can lead to significant response time increases (several seconds) by propagating queuing effects up and down the request chain in an n-tier application system because of strong dependencies among the tiers during request processing. This project runs large scale experiments in clouds and simulators to generate extensive fine-grain monitoring data in the investigation of very short bottlenecks, which are virtually invisible under typical performance monitoring tools with sampling periods of seconds or minutes. To match the time scale of very short bottlenecks, special instrumentation software tools are being refined to sample intra-server resource utilization at millisecond resolution and timestamp inter-server messages at microsecond resolution. Preliminary studies of n-tier application benchmarks with naturally bursty workloads have found very short bottlenecks that cause latency long tail in several system layers: systems software (JVM garbage collection), processor architecture (dynamic voltage and frequency scaling), and consolidation of applications in virtualized cloud environments. They show the potential for many other sources of very short bottlenecks, e.g., kernel daemon processes that use 100% of CPU for several milliseconds. Through careful distributed event analysis of the experimental data, new kinds of very short bottlenecks can be discovered, verified, reproduced, and studied in detail. Concrete solutions for specific very short bottlenecks have been developed, e.g., an improved Java garbage collector. However, other very short bottlenecks have no specific bug-fixes, e.g., those created by consolidated workload overlapping bursts of statistical nature. As an alternative to bug-fixes, more general solutions that disrupt queuing propagation are being explored. As a concrete example, instead of using a classic request/response approach, where waiting threads participate in the queuing propagation, asynchronous requests with notification of responses to reduce overall queuing is being investigated as a potential solution to eliminate or reduce the impact of several kinds of very short bottlenecks.
关于数据中心持续低利用率的合理解释(大约是Gartner报告的18%)是管理需求,需要在众所周知的潜伏长尾部问题上维持服务质量,其中一些通常在毫秒内通常返回的随机请求突然需要多秒钟。潜伏期长的尾巴问题在中等利用水平(例如50%)的情况下出现,所有资源远非饱和。尽管努力以各种方式弥补潜伏期长的尾巴问题,但其原因仍然难以捉摸:在大多数情况下,在自己执行时,实际上花了几秒钟的时间返回了几秒钟的要求。研究和解决潜伏期长的尾巴问题将有助于更好地利用,同时维持服务质量,从而降低云用户的成本,云提供商的投资回报率更高以及对环境的降低功耗。该项目的主要目的是研究非常短的瓶颈类别,其中CPU仅在一秒钟的一小部分中饱和,这是延迟长的尾巴问题的重要原因。尽管它们的寿命很短,但由于在请求处理过程中,由于在n层应用程序系统中的强烈依赖性,因此,瓶颈很短,可以通过在n层应用程序系统中的请求链上下传播排队效应,从而导致响应时间的显着增加(几秒钟)。该项目在云和模拟器中进行大规模实验,以在调查非常短的瓶颈中生成广泛的细粒监测数据,这些数据在典型的性能监控工具中几乎是看不见的,并具有秒或分钟的采样期。为了匹配非常短的瓶颈的时间尺度,正在精炼特殊的仪器软件工具,以在毫秒和分辨率下以毫秒分辨率的毫秒分辨率进行服务内的资源利用来采样。对具有自然爆发工作负载的N层应用基准的初步研究发现,很短的瓶颈很短,在多个系统层中引起潜伏期长的尾巴:系统软件(JVM垃圾收集),处理器体系结构(动态电压和频率缩放),以及在虚拟化云环境中应用应用程序的巩固。它们显示了许多其他非常短的瓶颈来源的潜力,例如,使用100%CPU进行几毫秒的内核守护程序过程。通过仔细的实验数据分布式事件分析,可以详细发现,验证,复制和研究新型非常短的瓶颈。已经开发了针对特定非常短的瓶颈的混凝土溶液,例如改进的Java垃圾收集器。但是,其他非常短的瓶颈没有特定的错误固定,例如,统计性质的合并工作负载重叠爆发而创建的瓶颈。作为固定错误的替代方法,正在探索中断排队传播的更通用的解决方案。作为一个具体的例子,与其使用经典的请求/响应方法,在等待线程参与排队繁殖的情况下,不使用通知响应以减少整体排队的异步请求,作为消除或减少几种非常短的瓶颈的影响的潜在解决方案。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Calton Pu其他文献
Buffer overflows: attacks and defenses for the vulnerability of the decade
缓冲区溢出:十年来漏洞的攻击与防御
- DOI:
10.1109/discex.2000.821514 - 发表时间:
2000 - 期刊:
- 影响因子:0
- 作者:
Crispin Cowan;Perry Wagle;Calton Pu;Steve Beattie;Jonathan Walpole - 通讯作者:
Jonathan Walpole
JTangCSB: A Cloud Service Bus for Cloud and Enterprise Application Integration
JTangCSB:用于云和企业应用集成的云服务总线
- DOI:
10.1109/mic.2014.62 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Xingjian Lu;Calton Pu;Zhaohui Wu;Hanwei Chen - 通讯作者:
Hanwei Chen
Calton Pu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Calton Pu', 18)}}的其他基金
RAPID: Tracking and Evaluation of the Coronavirus (COVID-19) Epidemic Propagation by Finding and Maintaining Live Knowledge in Social Media
RAPID:通过在社交媒体中查找和维护实时知识来跟踪和评估冠状病毒(COVID-19)的流行传播
- 批准号:
2026945 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
HNDS-I: Collaborative Research: Developing a Data Platform for Analysis of Nonprofit Organizations
HNDS-I:协作研究:开发用于分析非营利组织的数据平台
- 批准号:
2024320 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: Live Reality: Sustainable and Up-to-Date Information Quality in Live Social Media through Continuous Evidence-Based Knowledge Acquisition
EAGER:实时现实:通过持续的循证知识获取,实时社交媒体中可持续且最新的信息质量
- 批准号:
2039653 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
1st US-Japan Workshop Enabling Global Collaborations in Big Data Research; June, 2017, Atlanta, GA
第一届美日研讨会促进大数据研究的全球合作;
- 批准号:
1741034 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RCN: SAVI: Adaptive Management and Use of Resilient Infrastructures in Smart Cities: Support for Global Collaborative Research on Real-Time Analytics of Heterogeneous Big Data
RCN:SAVI:智慧城市弹性基础设施的适应性管理和使用:支持异构大数据实时分析的全球协作研究
- 批准号:
1550379 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: An Exploratory Study of Multi-Hazard Management through Multi-Source Integration of Physical and Social Sensors
EAGER:通过物理和社会传感器的多源集成进行多危害管理的探索性研究
- 批准号:
1402266 - 财政年份:2014
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SAVI: EAGER: for Global Research on Applying Information Technology to Support Effective Disaster Management (GRAIT-DM)
SAVI:EAGER:应用信息技术支持有效灾害管理的全球研究 (GRAIT-DM)
- 批准号:
1250260 - 财政年份:2012
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RAPID: Automating Emergency Data and Metadata Management to Support Effective Short Term and Long Term Disaster Recovery Efforts
RAPID:自动化应急数据和元数据管理,支持有效的短期和长期灾难恢复工作
- 批准号:
1138666 - 财政年份:2011
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
CSR:Small: Multi-Bottlenecks: What They Are and How to Find Them
CSR:小:多瓶颈:它们是什么以及如何找到它们
- 批准号:
1116451 - 财政年份:2011
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
II-NEW: Collaborative Research: Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons)
II-新:协作研究:垃圾邮件处理、归档和监控社区设施 (SPAM Commons)
- 批准号:
0855180 - 财政年份:2009
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
靶向Treg-FOXP3小分子抑制剂的筛选及其在肺癌免疫治疗中的作用和机制研究
- 批准号:32370966
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
化学小分子激活YAP诱导染色质可塑性促进心脏祖细胞重编程的表观遗传机制研究
- 批准号:82304478
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
靶向小胶质细胞的仿生甘草酸纳米颗粒构建及作用机制研究:脓毒症相关性脑病的治疗新策略
- 批准号:82302422
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
HMGB1/TLR4/Cathepsin B途径介导的小胶质细胞焦亡在新生大鼠缺氧缺血脑病中的作用与机制
- 批准号:82371712
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
小分子无半胱氨酸蛋白调控生防真菌杀虫活性的作用与机理
- 批准号:32372613
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Metabolism, Aging, Pathogenesis, Stress and Small RNAs Meeting
新陈代谢、衰老、发病机制、压力和小 RNA 会议
- 批准号:
9990946 - 财政年份:2021
- 资助金额:
$ 45万 - 项目类别:
Studies on electron acceleration and multiplication in lightning by a ground-based array of small dosimeters
地基小型剂量计阵列对闪电中电子加速和倍增的研究
- 批准号:
20K22354 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Collaborative Research: Characterizing Small-scale Lightning Discharges Associated with Explosive Volcanic Activity at Sakurajima Volcano
合作研究:描述与樱岛火山爆发性火山活动相关的小规模闪电放电特征
- 批准号:
1445704 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: Characterizing Small-scale Lightning Discharges Associated with Explosive Volcanic Activity at Sakurajima Volcano
合作研究:描述与樱岛火山爆发性火山活动相关的小规模闪电放电特征
- 批准号:
1445703 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant