Collaborative Research: SHF: SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity

合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复(EASER)范式

基本信息

  • 批准号:
    2146873
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-15 至 2025-05-31
  • 项目状态:
    未结题

项目摘要

Heterogeneity in computing refers to having a variety of devices present within one computing system or even within one node of a cluster. A number of technological trends are making a high degree of heterogeneity inevitable in High Performance Computing (HPC), leading to research along many directions. The traditional scheduling problem, which refers to taking a set of programs to be executed and mapping them to the available resources, becomes more complicated in the presence of such heterogeneity, as the schedulers need to interact with the compiler also. The goal of this project is to consider new paradigms for application execution in view of these developments and conduct research in developing predictions of execution times, compilation, parallelization, and scheduling. Traditionally, deciding (likely manually) how an application is to be parallelized, compilation, and cluster-level scheduling are done sequentially and independently. The investigators posit that their isolated treatment is not going to be acceptable when one tries to optimize for multi-tenant heterogeneous clusters. Instead, the investigators envision a requirement that can be referred to as EASER -- compilE-pArallelize-Schedule-rEtarget-Repeat. To elaborate on the vision, in the EASER paradigm the compiler first maps the core functions to a specific device, generating predictions of execution time that are input to the parallelization approach selection module, and together they produce a final executable. Subsequently, this binary is presented to the scheduler, which assesses the job queue and might suggest alternative configuration(s)/device(s). If so, a retargeting module is to be invoked, leading to a potential repetition of the above steps. This project develops, supports, and evaluates the EASER framework in the context of a cluster that executes emerging machine learning (ML) workloads. Research is proposed in the following areas: 1) Compiler-Driven Performance Prediction -- It includes a novel strategy that comprises a general model for predicting SIMD/VLIW performance and an operator classification based approach to developing a memory hierarchy performance model. 2) Integrated Job Scheduling and Parallelization Strategy Selection -- Building on the performance prediction models, these two (conventionally independent) modules are integrated, by including parameterized and incremental parallelization strategy selection methods and aggressively reducing the search space in scheduling methods. 3) Retargeting Compiler -- By classifying optimizations as either architecture-dependent or independent, a retargeting compiler for ML workloads will be developed. This project will also make several contributions to education and human resource development. Both investigators will be introducing course(s) (material) at the intersection of computer systems and machine learning, bringing attention to ML-related workloads in computer systems education. A majority of funds at each University will be used to support Ph.D. students in their research, who will be trained to work across traditional (sub-) areas. Both investigators are strongly committed to increasing diversity in computing fields and have a strong record of supervising members of underrepresented groups in their research programs. Building on their Universities' existing connections, they will be further working on improving diversity at all levels.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
计算中的异构是指在一个计算系统内或甚至在集群的一个节点内存在各种设备。许多技术趋势使得高性能计算(HPC)中的高度异构性不可避免,从而导致沿着沿着许多方向的研究。 传统的调度问题,这是指采取一组要执行的程序,并将它们映射到可用的资源,变得更加复杂,在这种异构性的存在,因为编译器也需要与编译器进行交互。该项目的目标是考虑这些发展的应用程序执行的新范例,并在开发执行时间,编译,并行化和调度的预测进行研究。 传统上,决定(可能是手动)如何并行化应用程序,编译和集群级调度是顺序和独立完成的。研究人员认为,当人们试图优化多租户异构集群时,他们的孤立处理是不可接受的。相反,研究人员设想了一个可以称为EASER的要求-- compilE-pArallelize-Schedule-rEtarget-Repeat。为了详细说明这一愿景,在EASER范式中,编译器首先将核心函数映射到特定设备,生成输入到并行化方法选择模块的执行时间预测,并一起生成最终的可执行文件。随后,该二进制文件被提交给调度程序,调度程序评估作业队列并可能建议替代配置/设备。如果是,则将调用重定向模块,从而导致上述步骤的潜在重复。 该项目在执行新兴机器学习(ML)工作负载的集群环境中开发,支持和评估EASER框架。本文提出了以下几个方面的研究内容:1)基于操作符驱动的性能预测--提出了一种新的策略,该策略包括一个预测SIMD/VLIW性能的通用模型和一个基于操作符分类的开发存储器层次性能模型的方法。2)集成作业调度和并行化策略选择--在性能预测模型的基础上,通过引入参数化和增量式并行化策略选择方法,并积极减少调度方法中的搜索空间,将这两个(传统上独立的)模块集成起来。3)重定向编译器-通过将优化分类为依赖于架构或独立,将开发用于ML工作负载的重定向编译器。 该项目还将对教育和人力资源开发作出若干贡献。两位研究人员将在计算机系统和机器学习的交叉点上介绍课程(材料),引起人们对计算机系统教育中ML相关工作量的关注。每所大学的大部分资金将用于支持博士学位。学生在他们的研究,谁将接受培训,跨传统(子)领域的工作。两位研究人员都坚定地致力于增加计算领域的多样性,并在其研究项目中监督代表性不足的群体成员方面有着良好的记录。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
End-to-End LU Factorization of Large Matrices on GPUs
SparCL: Sparse Continual Learning on the Edge
  • DOI:
    10.48550/arxiv.2209.09476
  • 发表时间:
    2022-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zifeng Wang;Zheng Zhan;Yifan Gong;Geng Yuan;Wei Niu;T. Jian;Bin Ren;Stratis Ioannidis;Yanzhi Wang;Jennifer G. Dy
  • 通讯作者:
    Zifeng Wang;Zheng Zhan;Yifan Gong;Geng Yuan;Wei Niu;T. Jian;Bin Ren;Stratis Ioannidis;Yanzhi Wang;Jennifer G. Dy
Towards Real-Time Segmentation on the Edge
  • DOI:
    10.1609/aaai.v37i2.25232
  • 发表时间:
    2023-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yanyu Li;Changdi Yang;Pu Zhao;Geng Yuan;Wei Niu;Jiexiong Guan;Hao Tang;Minghai Qin;Qing Jin;Bin Ren;Xue Lin;Yanzhi Wang
  • 通讯作者:
    Yanyu Li;Changdi Yang;Pu Zhao;Geng Yuan;Wei Niu;Jiexiong Guan;Hao Tang;Minghai Qin;Qing Jin;Bin Ren;Xue Lin;Yanzhi Wang
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
  • DOI:
    10.1145/3564663
  • 发表时间:
    2022-08
  • 期刊:
  • 影响因子:
    16.6
  • 作者:
    Jou-An Chen;Wei Niu;Bin Ren;Yanzhi Wang;Xipeng Shen
  • 通讯作者:
    Jou-An Chen;Wei Niu;Bin Ren;Yanzhi Wang;Xipeng Shen
Towards Socially Acceptable Food Type Recognition
  • DOI:
    10.1109/msn57253.2022.00110
  • 发表时间:
    2022-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Junjie Wang;Jiexiong Guan;Y.Alicia Hong;†. HongXue;Shuangquan Wang;Zhenming Liu;Bin Ren;Gang Zhou;William Mary
  • 通讯作者:
    Junjie Wang;Jiexiong Guan;Y.Alicia Hong;†. HongXue;Shuangquan Wang;Zhenming Liu;Bin Ren;Gang Zhou;William Mary
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Bin Ren其他文献

Development of arteriolar niche and self-renewal of breast cancer stem cells by lysophosphatidic Acid/protein kinase D signaling
通过溶血磷脂酸/蛋白激酶 D 信号传导实现小动脉生态位的发育和乳腺癌干细胞的自我更新
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yinan Jiang;Yichen Guo;Jinjin Hao;R. Guenter;J. Lathia;A. Beck;R. Hattaway;D. Hurst;Q. Wang;Yehe Liu;Qi Cao;H. Krontiras;He;R. Silverstein;Bin Ren
  • 通讯作者:
    Bin Ren
Revealing Protein Binding Affinity on Metal Surfaces:An Electrochemistry Approach
揭示金属表面上的蛋白质结合亲和力:电化学方法
  • DOI:
    10.1039/d1cc07098c
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    4.9
  • 作者:
    Danya Lyu;Pingshi Wang;Shuo zhang;Guokun Liu;Bin Ren
  • 通讯作者:
    Bin Ren
Development of Weak Signal Recognition and an Extraction Algorithm for Raman Imaging
拉曼成像微弱信号识别和提取算法的开发
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    7.4
  • 作者:
    Xin Wang;Guokun Liu;Mengxi Xu;Bin Ren;Zhongqun Tian
  • 通讯作者:
    Zhongqun Tian
Classication of 2-step nilpotent Lie algebras of dimension 8 with 3-dimensional center
具有 3 维中心的 8 维 2 步幂零李代数的分类
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bin Ren;Linsheng Zhu
  • 通讯作者:
    Linsheng Zhu
Grouped Temporal Enhancement Module for Human Action Recognition
用于人类动作识别的分组时间增强模块

Bin Ren的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Bin Ren', 18)}}的其他基金

Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2230944
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
EAGER: Collaborative Research: On the Theoretical Foundation of Recommendation System Evaluation
EAGER:协作研究:推荐系统评价的理论基础
  • 批准号:
    2142681
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: Achieving Real-Time Machine Learning with Sparsification-Compilation Co-design
职业:通过稀疏编译协同设计实现实时机器学习
  • 批准号:
    2047516
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
  • 批准号:
    2331302
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: LEGAS: Learning Evolving Graphs At Scale
协作研究:SHF:小型:LEGAS:大规模学习演化图
  • 批准号:
    2331301
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
  • 批准号:
    2403134
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
  • 批准号:
    2412357
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
  • 批准号:
    2402804
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
  • 批准号:
    2403408
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Toward Understandability and Interpretability for Neural Language Models of Source Code
合作研究:SHF:媒介:实现源代码神经语言模型的可理解性和可解释性
  • 批准号:
    2423813
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Enabling GPU Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的 GPU 性能仿真
  • 批准号:
    2402806
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
  • 批准号:
    2403135
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
  • 批准号:
    2403409
  • 财政年份:
    2024
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了