CRII: OAC: High-Efficiency Serverless Computing Systems for Deep Learning: A Hybrid CPU/GPU Architecture
CRII:OAC:用于深度学习的高效无服务器计算系统:混合 CPU/GPU 架构
基本信息
- 批准号:2153502
- 负责人:
- 金额:$ 17.49万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-01 至 2025-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Next-generation serverless cloud computing provides developers with simplified access to server management and administration, including event-driven execution, fine-grained resource provisioning, auto-scaling, and pay-as-you-go billing. The machine learning community is taking advantage of these benefits of serverless cloud computing to ease the development and deployment of deep learning (DL) applications. However, existing serverless computing platforms lack efficient support for GPUs, impeding DL practitioners from utilizing serverless computing for large-scale applications. This project will develop an efficient serverless computing platform with a hybrid CPU/GPU architecture to accelerate DL application development and deployment. The goal is to advance cutting-edge methodologies in both deep learning and serverless computing, which will result in a significant leap forward to benefit DL practitioners, DL users, and providers of cloud computing infrastructures, contributing to science advancement for society. The research findings will also enhance undergraduate and graduate education with exciting examples and demonstrations of real-world systems at the intersection of distributed computing, cloud computing, and deep learning.The project will develop a novel serverless computing platform with a hybrid CPU/GPU architecture that will provide DL applications with native GPU performance. Two core components constitute the hybrid serverless computing architecture, a shim virtualized GPU (vGPU) layer and a refactored container subsystem. The shim vGPU layer enables high-performance GPU sharing for concurrent serverless functions with low latency and high scalability. This layer provides fine-grained GPU resource provisioning and performance isolation by intercepting GPU calls from serverless functions using API remoting techniques. The vGPU layer optimizes GPU performance in serverless computing via GPU context caching and locality-aware scheduling to mitigate cold-starts and unnecessary data movement. The container subsystem accelerates the entire DL lifecycle by exploiting DL model structures and pipelined model loading to parallelize CPU-to-GPU memory copy and model execution. The subsystem exploits model partitioning techniques to accelerate the hybrid CPU/GPU architecture by dynamically distributing the DL model partitions to CPU and GPU. The scientific knowledge and tools designed and implemented from this research project will provide and enable innovations for next-generation cloud computing and deep learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项的全部或部分资金来自《2021年美国救援计划法案》(Public Law 117-2)。下一代无服务器云计算为开发人员提供了对服务器管理和管理的简化访问,包括事件驱动执行、细粒度资源配置、自动扩展和即付即用计费。机器学习社区正在利用无服务器云计算的这些优势来简化深度学习(DL)应用程序的开发和部署。然而,现有的无服务器计算平台缺乏对GPU的有效支持,阻碍了数字图书馆从业者将无服务器计算用于大规模应用。该项目将开发一个高效的无服务器计算平台,采用CPU/GPU混合架构,以加快DL应用的开发和部署。目标是推动深度学习和无服务器计算领域的尖端方法,这将导致重大飞跃,使数字图书馆从业者、数字图书馆用户和云计算基础设施提供商受益,为社会的科学进步做出贡献。研究成果还将加强本科和研究生教育,提供令人兴奋的实例和现实世界系统的演示,这些系统位于分布式计算、云计算和深度学习的交汇点。该项目将开发一种具有混合CPU/GPU架构的新型无服务器计算平台,将为数字图书馆应用程序提供原生GPU性能。两个核心组件构成了混合无服务器计算体系结构,填隙虚拟GPU(VGPU)层和重构容器子系统。填补vGPU层支持高性能GPU共享,支持低延迟和高可伸缩性的并发无服务器功能。这一层使用API远程处理技术拦截来自无服务器函数的GPU调用,从而提供细粒度的GPU资源供应和性能隔离。VGPU层通过GPU环境缓存和位置感知调度来优化无服务器计算中的GPU性能,以减少冷启动和不必要的数据移动。容器子系统通过利用DL模型结构和流水线模型加载来并行CPU到GPU的内存复制和模型执行,从而加速了整个DL生命周期。该子系统利用模型分区技术,通过将DL模型分区动态分配给CPU和GPU来加速混合CPU/GPU体系结构。该研究项目设计和实施的科学知识和工具将为下一代云计算和深度学习提供创新并使之成为可能。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Accelerating Serverless Computing by Harvesting Idle Resources
- DOI:10.1145/3485447.3511979
- 发表时间:2021-08
- 期刊:
- 影响因子:0
- 作者:Hanfei Yu;Hao Wang;Jian Li;Xuemei Yuan;Seung-Jong Park
- 通讯作者:Hanfei Yu;Hao Wang;Jian Li;Xuemei Yuan;Seung-Jong Park
Libra: Harvesting Idle Resources Safely and Timely in Serverless Clusters
- DOI:10.1145/3588195.3592996
- 发表时间:2023-08
- 期刊:
- 影响因子:0
- 作者:Hanfei Yu;Christian Fontenot;Hao Wang;Jian Li;Xu Yuan;Seung-Jong Park
- 通讯作者:Hanfei Yu;Christian Fontenot;Hao Wang;Jian Li;Xu Yuan;Seung-Jong Park
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Hao Wang其他文献
Oxidative stress increases the 17,20-lyase-catalyzing activity of adrenal P450c17 through p38α in the development of hyperandrogenism
在高雄激素血症的发展过程中,氧化应激通过 p38 α 增加肾上腺 P450c17 的 17,20-裂解酶催化活性
- DOI:
10.1016/j.mce.2019.01.020 - 发表时间:
2019 - 期刊:
- 影响因子:4.1
- 作者:
Wenjiao Zhu;Bing Han;Mengxia Fan;Nan Wang;Hao Wang;Hui Zhu;Tong Cheng;Shuangxia Zhao;Huaidong Song;Jie Qiao - 通讯作者:
Jie Qiao
Interacting Superprocesses with Discontinuous Spatial Motion and their Associated SPDEs
超级过程与不连续空间运动及其相关 SPDE 的交互
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Zhen;Hao Wang;J. Xiong - 通讯作者:
J. Xiong
State classification for a class of measure-valued branching diffusions in a Brownian medium
布朗介质中一类测值分支扩散的状态分类
- DOI:
- 发表时间:
1997 - 期刊:
- 影响因子:0
- 作者:
Hao Wang - 通讯作者:
Hao Wang
Weighted 3D GS algorithm for image-quality improvement of multi-plane holographic display
用于改善多平面全息显示图像质量的加权3D GS算法
- DOI:
10.3788/cjl201239.1009001 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Fang. Li;Y. Bi;Hao Wang;Minyuan Sun;Xinxin Kong - 通讯作者:
Xinxin Kong
Investigations into the Rock Dynamic Response under Blasting Load by an Improved DDA Approach
改进的 DDA 方法研究爆破荷载下岩石的动力响应
- DOI:
10.1155/2021/8827022 - 发表时间:
2021-02 - 期刊:
- 影响因子:1.8
- 作者:
Biting Xie;Xiuli Zhang;Hao Wang;Yuyong Jiao;Fei Zheng - 通讯作者:
Fei Zheng
Hao Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Hao Wang', 18)}}的其他基金
RII Track-4:NSF: Federated Analytics Systems with Fine-grained Knowledge Comprehension: Achieving Accuracy with Privacy
RII Track-4:NSF:具有细粒度知识理解的联合分析系统:通过隐私实现准确性
- 批准号:
2327480 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
- 批准号:
2403398 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Small: Critical Learning Periods Augmented Robust Federated Learning
协作研究:SaTC:核心:小型:关键学习期增强鲁棒联邦学习
- 批准号:
2315612 - 财政年份:2023
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
RI: Small: Enabling Interpretable AI via Bayesian Deep Learning
RI:小型:通过贝叶斯深度学习实现可解释的人工智能
- 批准号:
2127918 - 财政年份:2021
- 资助金额:
$ 17.49万 - 项目类别:
Continuing Grant
US-China planning visit: Development of High Performance and Multifunctional Infrastructure Material
中美计划访问:高性能多功能基础设施材料的开发
- 批准号:
1338297 - 财政年份:2013
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
SBIR Phase II: SAFE: Behavior-based Malware Detection and Prevention
SBIR 第二阶段:SAFE:基于行为的恶意软件检测和预防
- 批准号:
0750299 - 财政年份:2008
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
SBIR Phase I: SpiderWeb - Self-Healing Networks for Spyware Detection
SBIR 第一阶段:SpiderWeb - 用于间谍软件检测的自我修复网络
- 批准号:
0638170 - 财政年份:2007
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Constructibility and Large Cardinal Numbers
可构造性和大基数
- 批准号:
7902941 - 财政年份:1979
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
相似国自然基金
Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
- 批准号:
- 批准年份:2021
- 资助金额:35 万元
- 项目类别:地区科学基金项目
亚硝酰钌配合物[Ru(OAc)(2mqn)2NO]的光异构反应机理研究
- 批准号:21603131
- 批准年份:2016
- 资助金额:19.0 万元
- 项目类别:青年科学基金项目
机械化学条件下Mn(OAc)3促进的自由基串联反应研究
- 批准号:21242013
- 批准年份:2012
- 资助金额:10.0 万元
- 项目类别:专项基金项目
相似海外基金
CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning
CRII:OAC:基于 GPU 的大规模深度学习的压缩器辅助集体通信框架
- 批准号:
2348465 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level
OAC核心:功能级成本自适应监控和实时调优
- 批准号:
2402542 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
OAC Core: OAC Core Projects: GPU Geometric Data Processing
OAC 核心:OAC 核心项目:GPU 几何数据处理
- 批准号:
2403239 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
CRII: OAC: Dynamically Adaptive Unstructured Mesh Technologies for High-Order Multiscale Fluid Dynamics Simulations
CRII:OAC:用于高阶多尺度流体动力学仿真的动态自适应非结构化网格技术
- 批准号:
2348394 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
CRII: OAC: A Multi-fidelity Computational Framework for Discovering Governing Equations Under Uncertainty
CRII:OAC:用于发现不确定性下控制方程的多保真度计算框架
- 批准号:
2348495 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 17.49万 - 项目类别:
Standard Grant