权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)

合作研究：CNS Core：Small：将深度学习模型映射到张量化指令的编译系统（DELITE）

基本信息

批准号：
2230945
负责人：
Gagan Agrawal
金额：
$ 30万
依托单位：
AUGUSTA UNIVERSITY RESEARCH INSTITUTE, INC.
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2023-11-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2230945&HistoricalAwards=false
关键词：
Collaborative Research CNS Core Small

项目摘要

As Machine Learning (ML), and especially Deep Neural Network (DNN) workloads have rapidly become prominent, many existing architectures have been enriched with instructions and/or processing capabilities targeting these workloads. Examples of these instructions include AMX instructions from Intel, Tensor cores from NVIDIA, DOT instructions from AMD, and many others. The emergence of such tensorized instructions is leading to many common and related challenges regarding how they can be used for production-level modern DNNs. The current state-of-the-art for exploiting these instruction sets for DNN workloads is very limited, with existing systems either completely lacking attention on these, not addressing global optimizations for complex DNNs, or being limited in other ways. The premise of our work is that a compilation system that is cognizant of the latest DNN trends and can optimize across different tensorized instruction sets, will provide large efficiency gains for modern ML computations. The resulting agenda will likely result in significant technical, economic, and societal impacts. From the technical side, the work impacts areas like High-Performance Computing (HPC), Compilers, and systems supporting AI/ML workloads. As DNNs are becoming an integral part of applications that most humans use, this work is poised to have a large economic and societal impact. On the education side, the research at the intersection of systems and ML will be incorporated into multiple courses and help to increase diversity at all levels in computing education and research, particularly by involving members from underrepresented groups.This project addresses the following challenges associated with modern DNNs and recent and emerging tensorized instructions: 1) Local Instruction Selection for Dense Models -- To improve the execution efficiency of each operator, a critical first issue is selecting tensorized instructions (and associated data layouts), which will be addressed for arbitrary shapes of operators. 2) Global Optimizations for DNNs -- After local operator optimizations, each operator may prefer its own tensorized instruction and data layout, thus incurring significant data layout transformation costs during the execution of an entire DNN. This project formulates and solves a global optimization problem that chooses the right trade-off between the local operator execution and data transformation costs. 3) Optimizations for Dynamic DNNs -- This project also considers various forms of dynamism in modern DNN models including dynamic input shapes, dynamic control flows, and dynamic data structures. It proposes new optimizations such as those for effective memory management, while revisiting others like local and global instruction selection, in the presence of these forms of dynamism. 4) Mapping Sparse Models to Emerging Instructions -- This project also plans to improve the efficiency of using various types of tensorized instructions when sparsity is involved, building on top of earlier work for optimizing kernels like SpMM (and other sparse computations) on GPUs and SIMD instruction sets. 5) (Semi-) Automatic Support for New Instructions -- To minimize the optimization and programming effort, this proposal also introduces a module to automatically optimize DNN computations with new tensorized instructions or features. Besides addressing the above problems, one critical component of this project will be incorporating their implementations, together with code generation for multiple back-ends, in a reusable system. This system will take as the input the Computational Graph representation, and output Tensor and LLVM IRs, thus building around three representations widely used in the industry.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着机器学习（ML），特别是深度神经网络（DNN）工作负载迅速变得突出，许多现有架构已经丰富了针对这些工作负载的指令和/或处理能力。这些指令的例子包括来自英特尔的AMX指令，来自NVIDIA的Tensor内核，来自AMD的DOT指令等等。这种张紧指令的出现导致了许多关于如何将它们用于生产级现代dnn的共同和相关挑战。目前用于DNN工作负载开发这些指令集的最新技术非常有限，现有系统要么完全缺乏对这些指令集的关注，要么没有解决复杂DNN的全局优化问题，要么在其他方面受到限制。我们工作的前提是，一个能够识别最新深度神经网络趋势并可以跨不同张紧化指令集进行优化的编译系统，将为现代机器学习计算提供巨大的效率提升。由此产生的议程可能会产生重大的技术、经济和社会影响。从技术方面来看，这项工作影响了高性能计算（HPC）、编译器和支持AI/ML工作负载的系统等领域。随着深度神经网络成为大多数人类使用的应用程序的一个组成部分，这项工作将产生巨大的经济和社会影响。在教育方面，系统和机器学习交叉的研究将被纳入多个课程，并有助于增加计算机教育和研究各个层面的多样性，特别是通过让代表性不足的群体的成员参与进来。该项目解决了与现代深度神经网络以及最近和新兴的张化指令相关的以下挑战：1)密集模型的局部指令选择——为了提高每个算子的执行效率，关键的第一个问题是选择张化指令（以及相关的数据布局），这将解决任意形状的算子。2) DNN的全局优化——在局部算子优化之后，每个算子可能更喜欢自己的张紧指令和数据布局，因此在整个DNN的执行过程中会产生显著的数据布局转换成本。该项目制定并解决了一个全局优化问题，在本地运营商执行和数据转换成本之间选择正确的权衡。3)动态深度神经网络的优化——该项目还考虑了现代深度神经网络模型中各种形式的动态，包括动态输入形状、动态控制流和动态数据结构。它提出了新的优化，例如有效的内存管理，同时在存在这些形式的动态的情况下重新访问其他优化，例如本地和全局指令选择。4)将稀疏模型映射到新兴指令——该项目还计划在涉及稀疏性时提高使用各种类型的张紧化指令的效率，建立在gpu和SIMD指令集上优化SpMM（和其他稀疏计算）等内核的早期工作的基础上。5)（半）自动支持新指令——为了最大限度地减少优化和编程工作，本提案还引入了一个模块，可以使用新的张紧化指令或特征自动优化DNN计算。除了解决上述问题之外，这个项目的一个关键组成部分将是在一个可重用的系统中合并它们的实现，以及为多个后端生成代码。该系统将以Computational Graph表示作为输入，并输出Tensor和LLVM ir，从而围绕行业中广泛使用的三种表示进行构建。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Gagan Agrawal其他文献

MMIS-07, 08: Mining Multiple Information Sources Workshop Report

MMIS-07, 08：挖掘多信息源研讨会报告

DOI：
发表时间：
期刊：
SIGKDD Explorations
影响因子：
0
作者：
朱兴全;Gagan Agrawal;Yuri Breitbart;Ruoming Jin
通讯作者：
Ruoming Jin

Middleware for data mining applications on clusters and grids

DOI：
10.1016/j.jpdc.2007.06.007
发表时间：
2008-01-01
期刊：
Research article
影响因子：
作者：
Leonid Glimcher;Ruoming Jin;Gagan Agrawal
通讯作者：
Gagan Agrawal

<strong>POSTER:</strong> MDS-044 Cancer Disparities in Survival of Patients With Hematologic Malignancies in the Context of Social Determinants of Health: A Systematic Review

DOI：
10.1016/s2152-2650(23)00577-3
发表时间：
2023-09-01
期刊：
Conference abstract
影响因子：
作者：
Marisol Miranda-Galvis;Kellen Tjioe;Andrew Balas;Gagan Agrawal;Jorge Cortes
通讯作者：
Jorge Cortes

Organizing Records for Retrieval in Multi-Dimensional Range Searchable Encryption

多维范围可搜索加密中组织检索记录

DOI：
发表时间：
2024
期刊：
IACR Cryptology ePrint Archive
影响因子：
0
作者：
Mahdieh Heidaripour;Ladan Kian;Maryam Rezapour;Mark Holcomb;Benjamin Fuller;Gagan Agrawal;Hoda Maleki
通讯作者：
Hoda Maleki

The interaction between social determinants of health and cervical cancer survival: A systematic review

健康的社会决定因素与宫颈癌生存之间的相互作用：系统评价

DOI：
10.1016/j.ygyno.2023.12.020
发表时间：
2024-02-01
期刊：
Gynecologic Oncology
影响因子：
4.100
作者：
Kellen Cristine Tjioe;Marisol Miranda-Galvis;Marian Symmes Johnson;Gagan Agrawal;E. Andrew Balas;Jorge E. Cortes
通讯作者：
Jorge E. Cortes