CAREER: End-to-End Network Design for Unified Memory Disaggregation

职业:统一内存分解的端到端网络设计

基本信息

项目摘要

Applications in modern cloud datacenters are deployed in resource containers to isolate them from each other. Memory stranding is a pervasive problem in such containerized datacenters, where many memory-intensive applications grind to a halt even when free memory exists in other machines. This leads to low utilization, memory fragmentation, and overall increased cost. Memory disaggregation over ultra-fast networks can pool together such stranded memory in theory, but making it practical faces novel systems design, algorithmic, and integration challenges. They include bridging the still-sizable latency gap between local memory access vs. Remote Direct Memory Access (RDMA), transparently addressing network-wide fault-tolerance, load imbalance, and performance isolation issues, scalability, and enabling support for heterogeneous software and hardware technologies.The overarching research objective of this proposal is to realize a Unified Disaggregated Memory (UDM) abstraction over ultra-fast networks to expose stranded memory across the datacenter as a pool of available memory to out-of-memory containers in a fast, resilient, and scalable manner without any changes to the applications. By designing a comprehensive solution to address host-level, network-level, and end-to-end aspects of the aforementioned challenges, this research aims to make memory disaggregation practical. Specifically, by leveraging the unique characteristics of memory-intensive workloads, ultra-low-latency networks, and multi-tenancy in modern datacenters, this proposal will (i) design a low-latency host networking stack; (ii) enable performance isolation throughout the network; (iii) provide resilience to network-wide uncertainties such as failures and load imbalance; and (iv) incorporate support for heterogeneous memory (e.g., persistent memory), networking technologies, and resource management software.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代云计算中心中的应用程序部署在资源容器中,以使它们彼此隔离。内存搁浅在这种容器化的数据中心中是一个普遍存在的问题,即使其他机器中存在空闲内存,许多内存密集型应用程序也会陷入停顿。这会导致低利用率、内存碎片和总体成本增加。超高速网络上的内存分解在理论上可以将这些搁浅的内存集中在一起,但使其实用化面临着新的系统设计,算法和集成挑战。它们包括弥合本地内存访问与远程直接内存访问(RDMA)之间仍然相当大的延迟差距,透明地解决网络范围的容错,负载不平衡和性能隔离问题,可扩展性,该方案的主要研究目标是在超高速缓存上实现统一的非聚合存储器(UDM)抽象,快速网络以快速、有弹性且可扩展的方式将数据中心内的闲置内存作为可用内存池公开给内存不足容器,而无需对应用程序进行任何更改。通过设计一个全面的解决方案,以解决主机级,网络级和端到端方面的上述挑战,本研究的目的是使内存分解实用。具体而言,通过利用现代数据中心中的内存密集型工作负载、超低延迟网络和多租户的独特特性,该提案将(i)设计低延迟主机网络堆栈;(ii)实现整个网络的性能隔离;(iii)提供对网络范围的不确定性(例如故障和负载不平衡)的弹性;以及(iv)结合对异构内存(例如,该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Aequitas: Admission Control for Performance-Critical RPCs in Datacenters
Aequitas:数据中心中性能关键型 RPC 的准入控制
  • DOI:
    10.1145/3544216.3544271
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhang, Yiwen;Kumar, Gautam;Dukkipati, Nandita;Wu, Xian;Jha, Priyaranjan;Chowdhury, Mosharaf;Vahdat, Amin
  • 通讯作者:
    Vahdat, Amin
Hydra : Resilient and Highly Available Remote Memory
  • DOI:
  • 发表时间:
    2019-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Youngmoon Lee;H. Maruf;Mosharaf Chowdhury;Asaf Cidon;K. Shin
  • 通讯作者:
    Youngmoon Lee;H. Maruf;Mosharaf Chowdhury;Asaf Cidon;K. Shin
NetLock: Fast, Centralized Lock Management Using Programmable Switches
Programmable packet scheduling with a single queue
  • DOI:
    10.1145/3452296.3472887
  • 发表时间:
    2021-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhuolong Yu;Chuheng Hu;Jingfeng Wu;Xiao Sun;V. Braverman;Mosharaf Chowdhury;Zhenhua Liu;Xin Jin
  • 通讯作者:
    Zhuolong Yu;Chuheng Hu;Jingfeng Wu;Xiao Sun;V. Braverman;Mosharaf Chowdhury;Zhenhua Liu;Xin Jin
Effectively Prefetching Remote Memory with Leap
  • DOI:
  • 发表时间:
    2019-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    H. Maruf;Mosharaf Chowdhury
  • 通讯作者:
    H. Maruf;Mosharaf Chowdhury
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mosharaf Chowdhury其他文献

CDI-E: An Elastic Cloud Service for Data Engineering
CDI-E:数据工程的弹性云服务
  • DOI:
    10.14778/3554821.3554825
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Prakash Das;Shivangi Srivastava;Valentin Moskovich;Anmol Chaturvedi;Anant Mittal;Yongqin Xiao;Mosharaf Chowdhury
  • 通讯作者:
    Mosharaf Chowdhury
Fair Allocation of Heterogeneous and InterchangeableResources
异构和可互换资源的公平分配
  • DOI:
    10.1145/3305218.3305227
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiao Sun;T. Le;Mosharaf Chowdhury;Zhenhua Liu
  • 通讯作者:
    Zhenhua Liu
Pyxis: Scheduling Mixed Tasks in Disaggregated Datacenters
Pyxis:在分类数据中心调度混合任务
Coflow: A Networking Abstraction for Distributed Data-Parallel Applications
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mosharaf Chowdhury
  • 通讯作者:
    Mosharaf Chowdhury
Resource Management in Multi-* Clusters : Cloud Provisioning
多*集群中的资源管理:云配置
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mosharaf Chowdhury
  • 通讯作者:
    Mosharaf Chowdhury

Mosharaf Chowdhury的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mosharaf Chowdhury', 18)}}的其他基金

Collaborative Research: Conference: NSF NeTS PI Meeting - Spring 2023
协作研究:会议:NSF NeTS PI 会议 - 2023 年春季
  • 批准号:
    2309858
  • 财政年份:
    2023
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
Collaborative Research: NGSDI: Foundations of Clean and Balanced Datacenters: Treehouse
合作研究:NGSDI:清洁和平衡数据中心的基础:Treehouse
  • 批准号:
    2104243
  • 财政年份:
    2021
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS Core: Medium: Systems Support for Federated Learning
协作研究:CNS 核心:中:联邦学习的系统支持
  • 批准号:
    2106184
  • 财政年份:
    2021
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Continuing Grant
CNS Core: Medium: Collaborative Research: Towards Enabling Optimal Performance-Cost Tradeoffs in Distributed Storage
CNS 核心:中:协作研究:实现分布式存储中的最佳性能与成本权衡
  • 批准号:
    1900665
  • 财政年份:
    2019
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Continuing Grant
CNS Core: Small: Multi-Scale GPU Resource Management for AI Applications
CNS 核心:小型:AI 应用的多规模 GPU 资源管理
  • 批准号:
    1909067
  • 财政年份:
    2019
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
NeTS: CSR: Medium: Collaborative Research: Enabling Flexible and High Performance Big Data Analytics Over Geo-Distributed Clouds
NeTS:CSR:中:协作研究:通过地理分布式云实现灵活且高性能的大数据分析
  • 批准号:
    1563095
  • 财政年份:
    2016
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Continuing Grant
XPS: FULL: A Cross-Layer Approach Toward Low-Latency Data-Parallel Applications in Rack-Scale Computing
XPS:FULL:机架规模计算中低延迟数据并行应用的跨层方法
  • 批准号:
    1629397
  • 财政年份:
    2016
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
NeTS: Small: Collaborative Research: Enabling Application-Level Performance Predictability in Public Clouds
NeTS:小型:协作研究:在公共云中实现应用程序级性能可预测性
  • 批准号:
    1617773
  • 财政年份:
    2016
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant

相似国自然基金

真菌特异的内吞作用相关蛋白End3发挥作用的结构研究
  • 批准号:
    32000859
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
从PBMC-β-END-μ-阿片受体途径探讨华蟾素治疗癌痛的外周机制
  • 批准号:
    81173612
  • 批准年份:
    2011
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
研究EB1(End-Binding protein 1)的癌基因特性及作用机制
  • 批准号:
    30672361
  • 批准年份:
    2006
  • 资助金额:
    24.0 万元
  • 项目类别:
    面上项目

相似海外基金

IMR: MT: AirScope: A Versatile and Programmable UAV Platform for End-to-End Cellular Network Measurements in Rural Environments
IMR:MT:AirScope:用于农村环境中端到端蜂窝网络测量的多功能可编程无人机平台
  • 批准号:
    2323189
  • 财政年份:
    2023
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Continuing Grant
Collaborative Research: NeTS: JUNO3: End-to-end network slicing and orchestration in future programmable converged wireless-optical networks
合作研究:NetS:JUNO3:未来可编程融合无线光网络中的端到端网络切片和编排
  • 批准号:
    2210344
  • 财政年份:
    2022
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
Collaborative Research: NeTS: JUNO3: End-to-end network slicing and orchestration in future programmable converged wireless-optical networks
合作研究:NetS:JUNO3:未来可编程融合无线光网络中的端到端网络切片和编排
  • 批准号:
    2210343
  • 财政年份:
    2022
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
The Role of End-Binding Protein 2 and Microtubule Network in Inherited Cardiac Arrhythmias
末端结合蛋白 2 和微管网络在遗传性心律失常中的作用
  • 批准号:
    10351800
  • 财政年份:
    2022
  • 资助金额:
    $ 57.82万
  • 项目类别:
The Role of End-Binding Protein 2 and Microtubule Network in Inherited Cardiac Arrhythmias
末端结合蛋白 2 和微管网络在遗传性心律失常中的作用
  • 批准号:
    10580832
  • 财政年份:
    2022
  • 资助金额:
    $ 57.82万
  • 项目类别:
From IoT to Cloud: A Network Function Virtualization end-to-end communication solution
从物联网到云:网络功能虚拟化端到端通信解决方案
  • 批准号:
    RGPIN-2019-05250
  • 财政年份:
    2022
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Discovery Grants Program - Individual
COVID-19 Evidence Network to support Decision-making (COVID-END) - Extension
支持决策的 COVID-19 证据网络 (COVID-END) - 扩展
  • 批准号:
    457657
  • 财政年份:
    2021
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Directed Grant
From IoT to Cloud: A Network Function Virtualization end-to-end communication solution
从物联网到云:网络功能虚拟化端到端通信解决方案
  • 批准号:
    RGPIN-2019-05250
  • 财政年份:
    2021
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Discovery Grants Program - Individual
From IoT to Cloud: A Network Function Virtualization end-to-end communication solution
从物联网到云:网络功能虚拟化端到端通信解决方案
  • 批准号:
    RGPIN-2019-05250
  • 财政年份:
    2020
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Discovery Grants Program - Individual
CC* Integration-Large: An 'On-the-fly' Deeply Programmable End-to-end Network-Centric Platform for Edge-to-Core Workflows
CC* Integration-Large:用于边缘到核心工作流程的“即时”深度可编程端到端网络中心平台
  • 批准号:
    2018074
  • 财政年份:
    2020
  • 资助金额:
    $ 57.82万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了