SHF: Small: Lightweight Virtualization Driven Elastic Memory Management and Cluster Scheduling

SHF:小型:轻量级虚拟化驱动的弹性内存管理和集群调度

基本信息

  • 批准号:
    1816850
  • 负责人:
  • 金额:
    $ 37.6万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-07-01 至 2022-06-30
  • 项目状态:
    已结题

项目摘要

Data-centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve high resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and performance constraints on heterogeneous hardware. Data parallel processing often suffers from interference and significant memory pressure, resulting in excessive garbage collection and out-of-memory errors that harm application performance and reliability. Cluster memory management and scheduling is still inefficient, leading to low utilization and poor multi-service support. Existing approaches either focus on application awareness or operating system awareness, thus are not well positioned to address the semantic gap between application run-times and the operating system. This project aims to improve application performance and cluster efficiency via lightweight virtualization-enabled elastic memory management and cluster scheduling. It combines system experimentation with rigorous design and analyses to improve performance and efficiency, and tackle memory pressure of data-parallel processing. Developed system software will be open-sourced, providing opportunities to foster a large ecosystem that spans system software providers and customers. Enabled by lightweight containers, cluster scheduling and the underlying operating system can cooperate synergistically, such that, the dynamic resource demand of an application can be exposed to the operating system, and the cluster memory manager and scheduler can be assisted with rich run-time information retrieved from performance counters and operating system. Towards this end, the project aims to devise a distributed memory manager for data-parallel programs that can survive from memory pressure and enable elastic cluster memory management with architecture-aware container placement, design a cooperative paging to improve performance of memory swapping by extending the current virtual memory reclaim mechanism in Linux kernel, enable memory over-commitment for elastic cluster scheduling with a new service that can detect and exploit the over-commitment opportunities, and design a multi-queue based distributed task scheduler to manage performance interference and hardware heterogeneity. The contributions include a library of developed mechanisms and open-source system software at cluster and kernel levels that can significantly improve cluster utilization and application performance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据中心正在发展到在共享集群上托管异构工作负载,以降低运营成本并实现高资源利用率。然而,它是具有挑战性的调度异构工作负载与不同的资源需求和性能约束异构硬件。数据并行处理经常受到干扰和巨大的内存压力,导致过多的垃圾收集和内存不足错误,从而损害应用程序的性能和可靠性。集群内存管理和调度仍然效率低下,导致利用率低和多业务支持差。现有的方法要么专注于应用程序感知,要么专注于操作系统感知,因此不能很好地解决应用程序运行时和操作系统之间的语义差距。该项目旨在通过轻量级虚拟化支持的弹性内存管理和集群调度来提高应用程序性能和集群效率。它将系统实验与严格的设计和分析相结合,以提高性能和效率,并解决数据并行处理的内存压力。开发的系统软件将是开源的,为培育一个涵盖系统软件提供商和客户的大型生态系统提供了机会。通过轻量级容器启用,集群调度和底层操作系统可以协同合作,使得应用的动态资源需求可以暴露给操作系统,并且集群存储器管理器和调度器可以利用从性能计数器和操作系统检索的丰富运行时信息来辅助。 为此,本项目的目标是设计一个分布式内存管理器,用于数据并行程序,使其能够承受内存压力,并通过体系结构感知的容器放置实现弹性的集群内存管理;设计一个协作分页,通过扩展Linux内核中现有的虚拟内存回收机制来提高内存交换的性能;通过一种新的服务来检测和利用内存过度使用的机会,实现弹性集群调度的内存过度使用,并设计了一种基于多队列的分布式任务调度器来管理性能干扰和硬件异构性。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Elastic Parameter Server: Accelerating ML Training With Scalable Resource Scheduling
Semantic-aware Workflow Construction and Analysis for Distributed Data Analytics Systems
Memory at your service: fast memory allocation for latency-critical services
OS-Augmented Oversubscription of Opportunistic Memory with a User-Assisted OOM Killer
FlashByte: Improving Memory Efficiency with Lightweight Native Storage
FlashByte:通过轻量级本机存储提高内存效率
  • DOI:
    10.1109/ccgrid51090.2021.00016
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhao, Junxian;Pi, Aidi;Wang, Shaoqi;Zhou, Xiaobo
  • 通讯作者:
    Zhou, Xiaobo
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Xiaobo Zhou其他文献

Enhanced statistics-based rate adaptation for 802.11 wireless networks
增强型 802.11 无线网络基于统计的速率自适应
Biomarker Discovery from Proteomics
从蛋白质组学中发现生物标志物
Bayesian peak detection for Pro-TOF MS MALDI data
Pro-TOF MS MALDI 数据的贝叶斯峰值检测
Molecular Orientation of Polymer Acceptor Dominates Open-Circuit Voltage Losses in All-Polymer Solar Cells
聚合物受体的分子取向决定全聚合物太阳能电池的开路电压损耗
  • DOI:
    10.1021/acsenergylett.9b00416
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    22
  • 作者:
    Ke Zhou;Yang Wu;Yanfeng Liu;Xiaobo Zhou;Lin Zhang;Wei Ma
  • 通讯作者:
    Wei Ma
A Comparative Study of Human Motion Capture and Computational Analysis Tools
人体动作捕捉与计算分析工具的比较研究
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Seung;Xiaobo Zhou;Dan K Ramsey;V. Krovi
  • 通讯作者:
    V. Krovi

Xiaobo Zhou的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Xiaobo Zhou', 18)}}的其他基金

Developing novel machine learning approaches to studying cell development
开发新的机器学习方法来研究细胞发育
  • 批准号:
    2326879
  • 财政年份:
    2023
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Continuing Grant
Developing Random Field based novel approaches for spatial transcriptomics
开发基于随机场的空间转录组学新方法
  • 批准号:
    2217515
  • 财政年份:
    2022
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
CSR: Small: Moving MapReduce into the Cloud: Flexibility, Efficiency, and Elasticity
CSR:小:将 MapReduce 移至云端:灵活性、效率和弹性
  • 批准号:
    1422119
  • 财政年份:
    2014
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
NSF Travel Grant Support for IEEE ICCCN 2012 Conference
NSF 为 IEEE ICCCN 2012 会议提供差旅补助支持
  • 批准号:
    1238494
  • 财政年份:
    2012
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
CSR: Small: Autonomous Performance and Power Control on Virtualized Servers
CSR:小型:虚拟化服务器上​​的自主性能和电源控制
  • 批准号:
    1217979
  • 财政年份:
    2012
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
CAREER: Building Resilient Internet Services with Learning and Control
职业:通过学习和控制构建弹性互联网服务
  • 批准号:
    0844983
  • 财政年份:
    2009
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
CSR-PDOS: Resource Allocation Optimization for Quantitative Slowdown Differentiation in Multi-tier Server Clusters
CSR-PDOS:多层服务器集群中定量减速差异化的资源分配优化
  • 批准号:
    0720524
  • 财政年份:
    2007
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Continuing Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Development of MR fluid clutch mechanism equipped with small-lightweight and flexibility, and application to electric elbow prosthesis
小型、轻量、灵活的MR流体离合器机构的开发及在电动肘假肢上的应用
  • 批准号:
    21K12779
  • 财政年份:
    2021
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Collaborative Research: SHF: Small: Lightweight Modular Typestate
合作研究:SHF:小型:轻量级模块化类型状态
  • 批准号:
    2007024
  • 财政年份:
    2020
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
Snap Motor: Creating a Small and Lightweight Actuation Platform by Impulse Force Interaction
Snap Motor:通过脉冲力相互作用创建小型轻量的驱动平台
  • 批准号:
    20H02106
  • 财政年份:
    2020
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Collaborative Research: SHF: Small: Lightweight Modular Typestate
合作研究:SHF:小型:轻量级模块化类型状态
  • 批准号:
    2005889
  • 财政年份:
    2020
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
Odor sensing, feature extraction and its classification by machine learning and fabrication of small and lightweight e-noses
通过机器学习和小型轻量级电子鼻的制造来进行气味传感、特征提取和分类
  • 批准号:
    20K11888
  • 财政年份:
    2020
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Lightweight flexible shoulder prosthesis with various operation input modalities composed mainly of voice and small safe intuitive feedback device
以语音和小型安全直观反馈装置为主的多种操作输入方式的轻型柔性肩假肢
  • 批准号:
    19K20741
  • 财政年份:
    2019
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
NeTS: Small: Collaborative Research: Lightweight Adaptive Algorithms for Network Optimization at Scale towards Emerging Services
NetS:小型:协作研究:面向新兴服务的大规模网络优化的轻量级自适应算法
  • 批准号:
    1814614
  • 财政年份:
    2018
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
Paradigm_Shift - breakthrough Small Lightweight EV Platform
Paradigm_Shift - 突破性的小型轻量级电动汽车平台
  • 批准号:
    104328
  • 财政年份:
    2018
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Collaborative R&D
Development of very small and lightweight AC current supply with an air-core high-temperature superconducting transformer
开发具有空心高温超导变压器的超小型轻量交流电源
  • 批准号:
    18K04080
  • 财政年份:
    2018
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
NeTS: Small: Collaborative Research: Lightweight Adaptive Algorithms for Network Optimization at Scale towards Emerging Services
NetS:小型:协作研究:面向新兴服务的大规模网络优化的轻量级自适应算法
  • 批准号:
    1814322
  • 财政年份:
    2018
  • 资助金额:
    $ 37.6万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了