CRII: OAC: High-Efficiency Serverless Computing Systems for Deep Learning: A Hybrid CPU/GPU Architecture
CRII:OAC:用于深度学习的高效无服务器计算系统:混合 CPU/GPU 架构
基本信息
- 批准号:2153502
- 负责人:
- 金额:$ 17.49万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-01 至 2025-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Next-generation serverless cloud computing provides developers with simplified access to server management and administration, including event-driven execution, fine-grained resource provisioning, auto-scaling, and pay-as-you-go billing. The machine learning community is taking advantage of these benefits of serverless cloud computing to ease the development and deployment of deep learning (DL) applications. However, existing serverless computing platforms lack efficient support for GPUs, impeding DL practitioners from utilizing serverless computing for large-scale applications. This project will develop an efficient serverless computing platform with a hybrid CPU/GPU architecture to accelerate DL application development and deployment. The goal is to advance cutting-edge methodologies in both deep learning and serverless computing, which will result in a significant leap forward to benefit DL practitioners, DL users, and providers of cloud computing infrastructures, contributing to science advancement for society. The research findings will also enhance undergraduate and graduate education with exciting examples and demonstrations of real-world systems at the intersection of distributed computing, cloud computing, and deep learning.The project will develop a novel serverless computing platform with a hybrid CPU/GPU architecture that will provide DL applications with native GPU performance. Two core components constitute the hybrid serverless computing architecture, a shim virtualized GPU (vGPU) layer and a refactored container subsystem. The shim vGPU layer enables high-performance GPU sharing for concurrent serverless functions with low latency and high scalability. This layer provides fine-grained GPU resource provisioning and performance isolation by intercepting GPU calls from serverless functions using API remoting techniques. The vGPU layer optimizes GPU performance in serverless computing via GPU context caching and locality-aware scheduling to mitigate cold-starts and unnecessary data movement. The container subsystem accelerates the entire DL lifecycle by exploiting DL model structures and pipelined model loading to parallelize CPU-to-GPU memory copy and model execution. The subsystem exploits model partitioning techniques to accelerate the hybrid CPU/GPU architecture by dynamically distributing the DL model partitions to CPU and GPU. The scientific knowledge and tools designed and implemented from this research project will provide and enable innovations for next-generation cloud computing and deep learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项是根据2021年《美国救援计划法》的全部或部分资助(公共法117-2).Next-Generation无服务器云计算为开发人员提供了简化对服务器管理和管理的访问,包括事件驱动的执行,良好的粒度资源提供,自动规模,自动规模和付费付费。机器学习社区正在利用无服务器云计算的这些好处,以简化深度学习(DL)应用程序的开发和部署。但是,现有的无服务器计算平台缺乏对GPU的有效支持,阻碍了DL从业人员将无服务器计算用于大规模应用程序。该项目将使用混合CPU/GPU体系结构开发一个有效的无服务器计算平台,以加速DL应用程序开发和部署。目的是提高深度学习和无服务器计算的尖端方法,这将导致巨大的飞跃,从而使DL从业人员,DL用户和云计算基础架构的提供者受益,从而为社会的科学发展做出贡献。研究结果还将在分布式计算,云计算和深度学习的交集中以令人兴奋的示例和现实世界系统的示例来增强本科和研究生教育。该项目将开发一个新型的无服务器计算平台,并具有混合CPU/GPU架构,该平台将提供与本机GPU相关的DL应用程序。两个核心组件构成了混合无服务器计算体系结构,垫片虚拟化的GPU(VGPU)层和一个重构的容器子系统。 SHIM VGPU层可实现高性能的GPU共享,该共享无服务的无服务器功能,具有低延迟和高可扩展性。该层通过使用API远程技术从无服务器函数拦截GPU调用来提供细粒度的GPU资源提供和性能隔离。 VGPU层通过GPU上下文缓存和局部感知的计划在无服务器计算中优化了GPU性能,以减轻冷启动和不必要的数据移动。容器子系统通过利用DL模型结构和管道模型加载来并行化CPU到GPU内存副本和模型执行来加速整个DL生命周期。子系统利用模型分区技术来通过将DL模型分区动态分配给CPU和GPU来加速混合CPU/GPU架构。该研究项目设计和实施的科学知识和工具将为下一代云计算和深度学习提供创新。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的智力优点和更广泛影响的评估评估标准的评估值得支持的。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Accelerating Serverless Computing by Harvesting Idle Resources
- DOI:10.1145/3485447.3511979
- 发表时间:2021-08
- 期刊:
- 影响因子:0
- 作者:Hanfei Yu;Hao Wang;Jian Li;Xuemei Yuan;Seung-Jong Park
- 通讯作者:Hanfei Yu;Hao Wang;Jian Li;Xuemei Yuan;Seung-Jong Park
Libra: Harvesting Idle Resources Safely and Timely in Serverless Clusters
- DOI:10.1145/3588195.3592996
- 发表时间:2023-08
- 期刊:
- 影响因子:0
- 作者:Hanfei Yu;Christian Fontenot;Hao Wang;Jian Li;Xu Yuan;Seung-Jong Park
- 通讯作者:Hanfei Yu;Christian Fontenot;Hao Wang;Jian Li;Xu Yuan;Seung-Jong Park
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Hao Wang其他文献
Tetragon-based carbon allotropes T-C8 and its derivatives: A theoretical investigation
四方基碳同素异形体T-C8及其衍生物:理论研究
- DOI:
10.1016/j.commatsci.2017.12.028 - 发表时间:
2018-03 - 期刊:
- 影响因子:3.3
- 作者:
Yanan Lv;Hao Wang;Yuqing Guo;Bo Jiang;Yingxiang Cai - 通讯作者:
Yingxiang Cai
A phosphaphenanthrene-benzimidazole derivative for enhancing fire safety of epoxy resins
一种增强环氧树脂防火安全性的磷杂菲-苯并咪唑衍生物
- DOI:
10.1016/j.reactfunctpolym.2022.105390 - 发表时间:
2022-11 - 期刊:
- 影响因子:5.1
- 作者:
Yixiang Xu;Junjie Wang;Wenbin Zhang;Siqi Huo;Zhengping Fang;Pingan Song;Dong Wang;Hao Wang - 通讯作者:
Hao Wang
Global existence and decay of solutions for hard potentials to the fokker-planck-boltzmann equation without cut-off
无截止福克-普朗克-玻尔兹曼方程硬势解的全局存在和衰减
- DOI:
10.3934/cpaa.2020135 - 发表时间:
2020 - 期刊:
- 影响因子:1
- 作者:
Lvqiao Liu;Hao Wang - 通讯作者:
Hao Wang
Global existence and decay of solutions for soft potentials to the Fokker–Planck–Boltzmann equation without cut-off
无截止的福克-普朗克-玻尔兹曼方程软势解的全局存在和衰减
- DOI:
10.1016/j.jmaa.2020.123947 - 发表时间:
2020 - 期刊:
- 影响因子:1.3
- 作者:
Hao Wang - 通讯作者:
Hao Wang
Visualizing Plant Cells in A Brand New Way
以全新方式可视化植物细胞
- DOI:
10.1016/j.molp.2016.02.006 - 发表时间:
- 期刊:
- 影响因子:27.5
- 作者:
Hao Wang - 通讯作者:
Hao Wang
Hao Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Hao Wang', 18)}}的其他基金
RII Track-4:NSF: Federated Analytics Systems with Fine-grained Knowledge Comprehension: Achieving Accuracy with Privacy
RII Track-4:NSF:具有细粒度知识理解的联合分析系统:通过隐私实现准确性
- 批准号:
2327480 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
- 批准号:
2403398 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Small: Critical Learning Periods Augmented Robust Federated Learning
协作研究:SaTC:核心:小型:关键学习期增强鲁棒联邦学习
- 批准号:
2315612 - 财政年份:2023
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
RI: Small: Enabling Interpretable AI via Bayesian Deep Learning
RI:小型:通过贝叶斯深度学习实现可解释的人工智能
- 批准号:
2127918 - 财政年份:2021
- 资助金额:
$ 17.49万 - 项目类别:
Continuing Grant
US-China planning visit: Development of High Performance and Multifunctional Infrastructure Material
中美计划访问:高性能多功能基础设施材料的开发
- 批准号:
1338297 - 财政年份:2013
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
SBIR Phase II: SAFE: Behavior-based Malware Detection and Prevention
SBIR 第二阶段:SAFE:基于行为的恶意软件检测和预防
- 批准号:
0750299 - 财政年份:2008
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
SBIR Phase I: SpiderWeb - Self-Healing Networks for Spyware Detection
SBIR 第一阶段:SpiderWeb - 用于间谍软件检测的自我修复网络
- 批准号:
0638170 - 财政年份:2007
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Constructibility and Large Cardinal Numbers
可构造性和大基数
- 批准号:
7902941 - 财政年份:1979
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
相似国自然基金
Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
- 批准号:32160636
- 批准年份:2021
- 资助金额:35.00 万元
- 项目类别:地区科学基金项目
Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
- 批准号:
- 批准年份:2021
- 资助金额:35 万元
- 项目类别:地区科学基金项目
亚硝酰钌配合物[Ru(OAc)(2mqn)2NO]的光异构反应机理研究
- 批准号:21603131
- 批准年份:2016
- 资助金额:19.0 万元
- 项目类别:青年科学基金项目
机械化学条件下Mn(OAc)3促进的自由基串联反应研究
- 批准号:21242013
- 批准年份:2012
- 资助金额:10.0 万元
- 项目类别:专项基金项目
相似海外基金
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning
CRII:OAC:基于 GPU 的大规模深度学习的压缩器辅助集体通信框架
- 批准号:
2348465 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level
OAC核心:功能级成本自适应监控和实时调优
- 批准号:
2402542 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
OAC Core: OAC Core Projects: GPU Geometric Data Processing
OAC 核心:OAC 核心项目:GPU 几何数据处理
- 批准号:
2403239 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant