Collaborative Research: CNS Core: Small: A Principled Framework for Workload Distribution Techniques in Large-Scale Networks

合作研究:CNS 核心:小型:大规模网络中工作负载分配技术的原则框架

基本信息

  • 批准号:
    2008624
  • 负责人:
  • 金额:
    $ 33.35万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

Over the last decade, distributed computing and big data analytics have enabled unprecedented advancements in human life, including in medicine and health, education, business, and in stimulating new careers. And, it is fundamental to the computing industry, a significant economic engine for the US. However, traditional approaches to distributed computing are developed as ad hoc solutions to individual applications. In the classical paradigm, the system designer specifies a simple model of the network, along with a few low-level design goals, such as high utilization and low job completion time, and then develops a fixed algorithm to distribute the computation across workers. Although this paradigm has resulted in heuristics that work in practice, networks and applications continuously grow in complexity and heterogeneity, hence, the critical task of designing workload distribution algorithms that work well across a variety of conditions has become exceedingly difficult. This proposal addresses that challenge by developing a general framework that can be used as applications and networks grow. Ultimately, it will make distributed computing more explainable and better tailored to the needs of applications.Workload distribution has a long and rich history. However, the existing literature lacks design principles for reasoning about compute versus communication tradeoffs in large-scale networks. This proposal seeks to develop a principled framework for workload distribution techniques. It aims to provide the mathematical foundations behind function computation in distributed networks, where a function is an abstraction of a computation task, such as training a neural network, indexing the web, query processing, etc. Hence, the operator does not have to rely on heuristics or simplified models to decide on workload distribution. Instead, the proposed framework offers the trade-off space between cost and performance for the best use of available resources. This proposal aims to address the fundamental challenge of parallel function computation in distributed networks and how to enable rigorous mathematical analysis of deployed approaches by (i) developing a series of core principles for workload distribution systems through analyzing a variety of applications, including datacenter job scheduling, decentralized Stochastic Gradient Descent training, and erasure coding for inference jobs, and (ii) devising a novel scheduling framework for distributing computation tasks in distributed networks. The proposed framework leverages Little’s Law to minimize both communication and computation times when designing practical, robust, and high-performance workload distribution algorithms. The PIs will evaluate the proposed scheduler against state-of-the-art heuristic algorithms and pin-point the constraints and features that makes each heuristic a special use case of the generic framework.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在过去的十年中,分布式计算和大数据分析使人类生活取得了前所未有的进步,包括医疗和健康,教育,商业以及刺激新的职业生涯。而且,它是计算行业的基础,是美国重要的经济引擎。 然而,传统的分布式计算方法是作为针对单个应用程序的特别解决方案开发的。在经典范例中,系统设计者指定一个简单的网络模型,沿着一些低层次的设计目标,如高利用率和低作业完成时间,然后开发一个固定的算法来将计算分布在工作者之间。虽然这种范例已经导致了在实践中工作的分布式计算,但网络和应用程序的复杂性和异构性不断增长,因此,设计在各种条件下工作良好的工作负载分配算法的关键任务变得非常困难。 该提案通过开发一个可用于应用程序和网络增长的通用框架来应对这一挑战。 最终,它将使分布式计算更易于解释,更好地适应应用程序的需求。然而,现有的文献缺乏设计原则推理计算与通信权衡在大规模网络。这项建议旨在为工作量分配技术制定一个原则性框架。它旨在提供分布式网络中函数计算背后的数学基础,其中函数是计算任务的抽象,例如训练神经网络,索引Web,查询处理等,因此,运营商不必依赖于算法或简化模型来决定工作负载分布。相反,拟议的框架提供了成本和性能之间的权衡空间,以最佳利用现有资源。该提案旨在解决分布式网络中并行函数计算的基本挑战以及如何通过以下方式实现对部署方法的严格数学分析:(i)通过分析各种应用程序,包括数据中心作业调度,分散式随机梯度下降训练和推理作业的擦除编码,为工作负载分布系统开发一系列核心原则,以及(ii)设计一种新的调度框架,用于在分布式网络中分配计算任务。所提出的框架利用利特尔定律,以尽量减少通信和计算时间时,设计实用的,强大的,高性能的工作负载分配算法。PI将根据最先进的启发式算法对建议的调度程序进行评估,并确定使每个启发式算法成为通用框架的特殊用例的限制和功能。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Function Load Balancing Over Networks
Joint Optimization of Storage and Transmission via Coding Traffic Flows for Content Distribution
In-network Aggregation for Shared Machine Learning Clusters
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nadeen Gebara;M. Ghobadi;Paolo Costa
  • 通讯作者:
    Nadeen Gebara;M. Ghobadi;Paolo Costa
ARES: Adaptive, Reconfigurable, Erasure Coded, Atomic Storage
Contention Resolution for Coded Radio Networks
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Manya Ghobadi其他文献

Manya Ghobadi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Manya Ghobadi', 18)}}的其他基金

CAREER: Large-scale Dynamic Reconfigurable Networks
职业:大规模动态可重构网络
  • 批准号:
    2144766
  • 财政年份:
    2022
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS Core: Medium: A Stateful Switch Architecture for In-Network Compute
合作研究:CNS Core:Medium:用于网内计算的有状态交换机架构
  • 批准号:
    2211382
  • 财政年份:
    2022
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Spatial Multi-Tenant Neural Acceleration for Next Generation Datacenters
合作研究:SHF:中:下一代数据中心的空间多租户神经加速
  • 批准号:
    2107244
  • 财政年份:
    2021
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Continuing Grant
ASCENT: Collaborative Research: Scaling Distributed AI Systems based on Universal Optical I/O
ASCENT:协作研究:基于通用光学 I/O 扩展分布式人工智能系统
  • 批准号:
    2023468
  • 财政年份:
    2020
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
  • 批准号:
    2345339
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2230945
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
  • 批准号:
    2225578
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
  • 批准号:
    2406598
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
  • 批准号:
    2418188
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Creating An Extensible Internet Through Interposition
合作研究:CNS核心:小:通过介入创建可扩展的互联网
  • 批准号:
    2242503
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Adaptive Smart Surfaces for Wireless Channel Morphing to Enable Full Multiplexing and Multi-user Gains
合作研究:CNS 核心:小型:用于无线信道变形的自适应智能表面,以实现完全复用和多用户增益
  • 批准号:
    2343959
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Efficient Ways to Enlarge Practical DNA Storage Capacity by Integrating Bio-Computer Technologies
合作研究:中枢神经系统核心:小型:通过集成生物计算机技术扩大实用 DNA 存储容量的有效方法
  • 批准号:
    2343863
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2341378
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
Collaborative Research: CISE-MSI: RCBP-RF: CNS: ESD4CDaT - Efficient System Design for Cancer Detection and Treatment
合作研究:CISE-MSI:RCBP-RF:CNS:ESD4CDaT - 癌症检测和治疗的高效系统设计
  • 批准号:
    2318573
  • 财政年份:
    2023
  • 资助金额:
    $ 33.35万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了