Collaborative Research: CNS Core: Small: Optimizing Large-Scale Heterogeneous ML Platforms
合作研究:CNS Core:小型:优化大规模异构机器学习平台
基本信息
- 批准号:2146909
- 负责人:
- 金额:$ 25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-01-01 至 2024-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Large-scale artificial intelligence and machine learning (AI/ML) platforms are playing a vital role in the current data revolution. To minimize efforts from users, an end-to-end solution is desired to deploy complex workflow over possibly heterogeneous computing clusters. However, the scheduling and resource management problems behind such “push-button” deployment are challenging. If left unsolved, these costly systems will be severely under-utilized, leading to unnecessary electricity consumption and greenhouse gas emissions. This project will develop efficient resource allocation policies for distributed, large-scale AI/ML systems to tackle the challenges. Specifically, this project will accelerate and parallelize the large-scale optimization and inference tasks that dominate workloads in AI/ML platforms via distributed optimization that provides fault tolerance and robustness to stragglers in heterogeneous settings. Built upon the distributed optimization, the project will further schedule AI/ML workflows with precedence constraints among sub-tasks. Finally, heterogeneous resources are allocated among jobs fairly and efficiently in the case where the resources being allocated are exchangeable, which is key for AI/ML platforms with graphic processing units (GPUs) and other accelerators. The project will provide new fundamental algorithms for scheduling and resource allocation in AI/ML platforms used across academia and industry. The algorithmic ideas will be developed in the context of core, classical models and so will apply more broadly than AI/ML platforms, e.g., to networking, storage, supply chain management, and beyond. The project will seek to broaden the participation of underrepresented groups in Science, Technology, Engineering and Mathematics by planned activities including the development of accelerated mathematics programs for middle school students, summer programs for middle-school and high-school students, and summer research programs for undergraduate students.The project will make its software artifacts, datasets, and research results available to the research community on the project website at https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/ Artifacts will be maintained for a minimum of 10 years.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大规模人工智能和机器学习(AI/ML)平台在当前的数据革命中发挥着至关重要的作用。为了最大限度地减少用户的工作量,需要一种端到端的解决方案来在可能异构的计算集群上部署复杂的工作流。然而,这种“按钮式”部署背后的调度和资源管理问题是具有挑战性的。 如果不加以解决,这些昂贵的系统将严重利用不足,导致不必要的电力消耗和温室气体排放。该项目将为分布式大规模AI/ML系统开发有效的资源分配策略,以应对挑战。具体来说,该项目将通过分布式优化来加速和并行化在AI/ML平台中占主导地位的大规模优化和推理任务,为异构环境中的落伍者提供容错和鲁棒性。在分布式优化的基础上,该项目将进一步调度AI/ML工作流,子任务之间具有优先约束。最后,在分配的资源可交换的情况下,异构资源在作业之间公平有效地分配,这对于具有图形处理单元(GPU)和其他加速器的AI/ML平台来说是关键。该项目将为学术界和工业界使用的AI/ML平台中的调度和资源分配提供新的基础算法。算法思想将在核心经典模型的背景下开发,因此将比AI/ML平台应用更广泛,例如,到网络、存储、供应链管理等等。该项目将寻求扩大科学,技术,工程和数学方面代表性不足的群体的参与,计划开展的活动包括为中学生开发加速数学课程,为初中和高中学生开发暑期课程,为本科生开发暑期研究课程。该项目将使其软件工件,数据集,该https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Deep Learning-Assisted Online Task Offloading for Latency Minimization in Heterogeneous Mobile Edge
- DOI:10.1109/tmc.2023.3285882
- 发表时间:2024-05
- 期刊:
- 影响因子:7.9
- 作者:Yu Liu;Yingling Mao;Z. Liu;Yuanyuan Yang
- 通讯作者:Yu Liu;Yingling Mao;Z. Liu;Yuanyuan Yang
Applied Online Algorithms with Heterogeneous Predictors
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Jessica Maghakian;Russell Lee;M. Hajiesmaili;Jian Li;R. Sitaraman;Zhenhu Liu
- 通讯作者:Jessica Maghakian;Russell Lee;M. Hajiesmaili;Jian Li;R. Sitaraman;Zhenhu Liu
Online Container Scheduling for Data-intensive Applications in Serverless Edge Computing
- DOI:10.1109/infocom53939.2023.10229034
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Xiaojun Shang;Yingling Mao;Yu Liu;Yaodong Huang;Zhen Liu;Yuanyuan Yang
- 通讯作者:Xiaojun Shang;Yingling Mao;Yu Liu;Yaodong Huang;Zhen Liu;Yuanyuan Yang
Energy-Aware Online Task Offloading and Resource Allocation for Mobile Edge Computing
- DOI:10.1109/icdcs57875.2023.00073
- 发表时间:2023-07
- 期刊:
- 影响因子:0
- 作者:Yu Liu;Yingling Mao;Xiaojun Shang;Z. Liu;Yuanyuan Yang
- 通讯作者:Yu Liu;Yingling Mao;Xiaojun Shang;Z. Liu;Yuanyuan Yang
Joint Task Offloading and Resource Allocation in Heterogeneous Edge Environments
- DOI:10.1109/infocom53939.2023.10229015
- 发表时间:2023-05
- 期刊:
- 影响因子:0
- 作者:Yu Liu;Yingling Mao;Z. Liu;Fan Ye;Yuanyuan Yang
- 通讯作者:Yu Liu;Yingling Mao;Z. Liu;Fan Ye;Yuanyuan Yang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhenhua Liu其他文献
From Darkness to Light: Pretargeted Radionuclide Imaging Driven by Tetrazine Bioorthogonal Chemistry
从黑暗到光明:四嗪生物正交化学驱动的预定位放射性核素成像
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:3.4
- 作者:
Guohua Shen;Anren Kuang;Zhenhua Liu;Yige Bao;Haoxing Wu - 通讯作者:
Haoxing Wu
Experimental research on boiling heat transfer characteristics of compact staggered tube bundles in reduced pressures
- DOI:
10.3901/jme.2007.08.224 - 发表时间:
2007 - 期刊:
- 影响因子:4.2
- 作者:
Zhenhua Liu - 通讯作者:
Zhenhua Liu
Copper-catalyzed C–N bond formation with imidazo[1,2-a]pyridines
铜催化咪唑并[1,2-a]吡啶形成 C–N 键
- DOI:
10.1039/c8ob01853g - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Kai Sun;Shiqiang Mu;Zhenhua Liu;Ranran Feng;Yali Li;Kui Pang;Bing Zhang - 通讯作者:
Bing Zhang
Pax2-cre-mediated deletion of Lgl1 causes abnormal development of the midbrain
Pax2-cre介导的Lgl1缺失导致中脑发育异常
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:2.7
- 作者:
Congzhe Hou;Aizhen Zhang;Tingting Zhang;Chao Ye;Zhenhua Liu;Jiangang Gao - 通讯作者:
Jiangang Gao
Estimation of the homoplasmy degree for transplastomic tobacco using quantitative real-time PCR
使用定量实时 PCR 估计转质体烟草的同质性程度
- DOI:
10.1007/s00217-010-1265-z - 发表时间:
2010 - 期刊:
- 影响因子:3.3
- 作者:
Huifeng Shen;Bingjun Qian;Litao Yang;W. Liang;Weiwei Chen;Zhenhua Liu;Dabing Zhang - 通讯作者:
Dabing Zhang
Zhenhua Liu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhenhua Liu', 18)}}的其他基金
Collaborative Research: CNS Core: Medium: Dynamic Data-driven Systems - Theory and Applications
合作研究:CNS 核心:媒介:动态数据驱动系统 - 理论与应用
- 批准号:
2106027 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
CAREER: An adaptive framework to accelerate real-time workloads in heterogeneous and reconfigurable environments
职业:一个自适应框架,可在异构和可重新配置的环境中加速实时工作负载
- 批准号:
2046444 - 财政年份:2021
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
NeTS: Small: Collaborative Research: Enabling Application-Level Performance Predictability in Public Clouds
NeTS:小型:协作研究:在公共云中实现应用程序级性能可预测性
- 批准号:
1617698 - 财政年份:2016
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
CRII: NeTS: Enabling Demand Response from Cloud Data Centers -- from Sustainable IT to IT for Sustainability
CRII:NeTS:实现云数据中心的需求响应——从可持续 IT 到 IT 促进可持续发展
- 批准号:
1464388 - 财政年份:2015
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
- 批准号:
2230945 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
- 批准号:
2406598 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
- 批准号:
2418188 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
- 批准号:
2345339 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
- 批准号:
2225578 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: Creating An Extensible Internet Through Interposition
合作研究:CNS核心:小:通过介入创建可扩展的互联网
- 批准号:
2242503 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: Adaptive Smart Surfaces for Wireless Channel Morphing to Enable Full Multiplexing and Multi-user Gains
合作研究:CNS 核心:小型:用于无线信道变形的自适应智能表面,以实现完全复用和多用户增益
- 批准号:
2343959 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: Efficient Ways to Enlarge Practical DNA Storage Capacity by Integrating Bio-Computer Technologies
合作研究:中枢神经系统核心:小型:通过集成生物计算机技术扩大实用 DNA 存储容量的有效方法
- 批准号:
2343863 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
- 批准号:
2341378 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: Innovating Volumetric Video Streaming with Motion Forecasting, Intelligent Upsampling, and QoE Modeling
合作研究:CNS 核心:中:通过运动预测、智能上采样和 QoE 建模创新体积视频流
- 批准号:
2409008 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant