Collaborative Research: CNS Core: Small: Optimizing Large-Scale Heterogeneous ML Platforms

合作研究:CNS Core:小型:优化大规模异构机器学习平台

基本信息

  • 批准号:
    2146814
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-01-01 至 2024-12-31
  • 项目状态:
    已结题

项目摘要

Large-scale artificial intelligence and machine learning (AI/ML) platforms are playing a vital role in the current data revolution. To minimize efforts from users, an end-to-end solution is desired to deploy complex workflow over possibly heterogeneous computing clusters. However, the scheduling and resource management problems behind such “push-button” deployment are challenging. If left unsolved, these costly systems will be severely under-utilized, leading to unnecessary electricity consumption and greenhouse gas emissions. This project will develop efficient resource allocation policies for distributed, large-scale AI/ML systems to tackle the challenges. Specifically, this project will accelerate and parallelize the large-scale optimization and inference tasks that dominate workloads in AI/ML platforms via distributed optimization that provides fault tolerance and robustness to stragglers in heterogeneous settings. Built upon the distributed optimization, the project will further schedule AI/ML workflows with precedence constraints among sub-tasks. Finally, heterogeneous resources are allocated among jobs fairly and efficiently in the case where the resources being allocated are exchangeable, which is key for AI/ML platforms with graphic processing units (GPUs) and other accelerators. The project will provide new fundamental algorithms for scheduling and resource allocation in AI/ML platforms used across academia and industry. The algorithmic ideas will be developed in the context of core, classical models and so will apply more broadly than AI/ML platforms, e.g., to networking, storage, supply chain management, and beyond. The project will seek to broaden the participation of underrepresented groups in Science, Technology, Engineering and Mathematics by planned activities including the development of accelerated mathematics programs for middle school students, summer programs for middle-school and high-school students, and summer research programs for undergraduate students.The project will make its software artifacts, datasets, and research results available to the research community on the project website at https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/ Artifacts will be maintained for a minimum of 10 years.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大规模人工智能和机器学习(AI/ML)平台在当前的数据革命中发挥着至关重要的作用。为了最大限度地减少用户的工作量,需要一种端到端的解决方案来在可能的异类计算集群上部署复杂的工作流。然而,这种“按键”部署背后的调度和资源管理问题是具有挑战性的。如果不加以解决,这些昂贵的系统将严重得不到充分利用,导致不必要的电力消耗和温室气体排放。该项目将为分布式、大规模的AI/ML系统制定有效的资源分配策略,以应对这些挑战。具体地说,该项目将通过为异类环境中的落后者提供容错和健壮性的分布式优化,加速和并行化AI/ML平台中主导工作负载的大规模优化和推理任务。在分布式优化的基础上,该项目将进一步调度具有子任务之间优先约束的AI/ML工作流。最后,在分配的资源是可交换的情况下,在作业之间公平高效地分配异质资源,这是具有图形处理器(GPU)和其他加速器的AI/ML平台的关键。该项目将为学术界和工业界使用的AI/ML平台中的调度和资源分配提供新的基本算法。算法思想将在核心、经典模型的背景下开发,因此将比AI/ML平台应用更广泛,例如,网络、存储、供应链管理等。该项目将寻求通过有计划的活动扩大科学、技术、工程和数学领域未被充分代表的群体的参与,包括为中学生开发加速数学课程,为中学生和高中生开发暑期课程,为本科生提供暑期研究课程。该项目将使其软件制品、数据集、该奖项反映了https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting
在线暂停和恢复问题:最优算法和碳感知负载转移的应用
The Online Knapsack Problem with Departures
出发时的在线背包问题
Smoothed Online Optimization with Unreliable Predictions
Robustness and Consistency in Linear Quadratic Control with Untrusted Predictions
具有不可信预测的线性二次控制的鲁棒性和一致性
Chasing convex bodies and functions with black-box advice
用黑盒建议追逐凸体和函数
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nicolas Christianson, Tinashe Handina
  • 通讯作者:
    Nicolas Christianson, Tinashe Handina
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Adam Wierman其他文献

Best of Both Worlds: Stochastic and Adversarial Convex Function Chasing
两全其美:随机和对抗性凸函数追逐
  • DOI:
    10.48550/arxiv.2311.00181
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Neelkamal Bhuyan;Debankur Mukherjee;Adam Wierman
  • 通讯作者:
    Adam Wierman
Characterizing the impact of the workload on the value of dynamic resizing in data centers
描述工作负载对数据中心动态调整大小的价值的影响
  • DOI:
    10.1145/2254756.2254815
  • 发表时间:
    2012-06
  • 期刊:
  • 影响因子:
    2.2
  • 作者:
    Minghong Lin;Florin Ciucu;Adam Wierman;Chuang Lin
  • 通讯作者:
    Chuang Lin
A view of the sustainable computing landscape
  • DOI:
    10.1016/j.patter.2025.101296
  • 发表时间:
    2025-07-11
  • 期刊:
  • 影响因子:
    7.400
  • 作者:
    Benjamin C. Lee;David Brooks;Arthur van Benthem;Mariam Elgamal;Udit Gupta;Gage Hills;Vincent Liu;Linh Thi Xuan Phan;Benjamin Pierce;Christopher Stewart;Emma Strubell;Gu-Yeon Wei;Adam Wierman;Yuan Yao;Minlan Yu
  • 通讯作者:
    Minlan Yu
Pricing Uncertainty in Stochastic Multi-Stage Electricity Markets
随机多阶段电力市场的定价不确定性
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lucien Werner;Nicolas H. Christianson;Alessandro Zocca;Adam Wierman;Steven H. Low
  • 通讯作者:
    Steven H. Low
Distributionally Robust Constrained Reinforcement Learning under Strong Duality
强对偶下的分布鲁棒约束强化学习
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhengfei Zhang;Kishan Panaganti;Laixi Shi;Yanan Sui;Adam Wierman;Yisong Yue
  • 通讯作者:
    Yisong Yue

Adam Wierman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Adam Wierman', 18)}}的其他基金

Collaborative Research: NGSDI: CarbonFirst: A Sustainable and Reliable Carbon-Centric Cloud-Edge Software Infrastructure
合作研究:NGSDI:CarbonFirst:可持续且可靠的以碳为中心的云边缘软件基础设施
  • 批准号:
    2105648
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Collaborative Research: CPS: Medium: Enabling DER Integration via Redesign of Information Flows
协作研究:CPS:中:通过重新设计信息流实现 DER 集成
  • 批准号:
    2136197
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Medium: Dynamic Data-driven Systems - Theory and Applications
合作研究:CNS 核心:媒介:动态数据驱动系统 - 理论与应用
  • 批准号:
    2106403
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
NeTS: Large: Networked Markets: Theory and Applications
NeTS:大型:网络市场:理论与应用
  • 批准号:
    1518941
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CPS: Synergy: Collaborative Research: Beyond Stability: Performance, Efficiency and Disturbance Management for Smart Infrastructure Systems
CPS:协同:协作研究:超越稳定性:智能基础设施系统的性能、效率和干扰管理
  • 批准号:
    1545096
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: The Value of Privacy
职业:隐私的价值
  • 批准号:
    1254169
  • 财政年份:
    2013
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
CSR: Small:Collaborative Research: Data Center Demand Response: Coordinating the Cloud and the Smart Grid
CSR:小型:协作研究:数据中心需求响应:协调云和智能电网
  • 批准号:
    1319820
  • 财政年份:
    2013
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: A Unified Approach to Quantifying Market Power in the Future Grid
协作研究:量化未来电网市场力量的统一方法
  • 批准号:
    1307794
  • 财政年份:
    2013
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
ICES: Small: A Revealed Preference Approach to Computational Complexity in Economics
ICES:小:经济学中计算复杂性的显示偏好方法
  • 批准号:
    1101470
  • 财政年份:
    2011
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: Towards a rigorous foundation for scheduling in modern systems
职业生涯:为现代系统中的调度奠定严格的基础
  • 批准号:
    0846025
  • 财政年份:
    2009
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
  • 批准号:
    2345339
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2230945
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
  • 批准号:
    2225578
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
  • 批准号:
    2406598
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
  • 批准号:
    2418188
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Creating An Extensible Internet Through Interposition
合作研究:CNS核心:小:通过介入创建可扩展的互联网
  • 批准号:
    2242503
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Adaptive Smart Surfaces for Wireless Channel Morphing to Enable Full Multiplexing and Multi-user Gains
合作研究:CNS 核心:小型:用于无线信道变形的自适应智能表面,以实现完全复用和多用户增益
  • 批准号:
    2343959
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Efficient Ways to Enlarge Practical DNA Storage Capacity by Integrating Bio-Computer Technologies
合作研究:中枢神经系统核心:小型:通过集成生物计算机技术扩大实用 DNA 存储容量的有效方法
  • 批准号:
    2343863
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2341378
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Medium: Innovating Volumetric Video Streaming with Motion Forecasting, Intelligent Upsampling, and QoE Modeling
合作研究:CNS 核心:中:通过运动预测、智能上采样和 QoE 建模创新体积视频流
  • 批准号:
    2409008
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了