权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

OAC Core: SHF: Small: Enabling Rapid Design and Deployment of Deep Learning Models on Hardware Accelerators

OAC 核心：SHF：小型：支持在硬件加速器上快速设计和部署深度学习模型

基本信息

批准号：
1909900
负责人：
Tushar Krishna
金额：
$ 50万
依托单位：
Georgia Tech Research Corporation
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-06-01 至 2023-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1909900&HistoricalAwards=false
关键词：
OAC Core SHF Small Enabling

项目摘要

Machine Learning (ML) has rapidly emerged as one of the foundational technologies of this century. It is pervasive in our lives today, from allowing us to unlock smartphones, to powering recommendation engines for almost any human activities (dinning, movies, services etc). The applications of ML are expected to become even more transformative in the future, especially in healthcare, autonomous transport, robotics, agriculture, education, and space exploration. ML models are computationally expensive, need large amounts of memory to store the trained model, and have strict runtime requirements. They cannot be run efficiently on general-purpose processors, which has led to an explosive growth in custom hardware accelerators for ML. However, getting good performance and energy-efficiency from these accelerators is itself challenging, as it relies on three components: the ML model itself, the hardware parameters, and the scheduling of computations in the ML model onto the limited compute and memory resources on the accelerator. The proposed research will develop an open-source software cyberinfrastructure called MAESTRO that can be used to analytically determine the performance and energy-efficiency of ML models over target hardware platforms, prior to actually building the hardware and deploying the model. MAESTRO will be extremely useful for students, researchers, and industry practitioners alike to learn about, design, and deploy custom ML solutions. The project will also engage undergraduate and high-school students to teach them about ML through outreach activities involving hackathons and hardware building.Mapping ML computations over finite compute elements within an accelerator, and understanding the corresponding data that needs to move across the memory hierarchy is a non-trivial problem; the space of all possible ways of slicing and dicing the model (known as "dataflow") is exponentially complex, and the benefits of any mapping vary across ML models and target accelerator. To address this, the PI will first develop a set of data-centric directives to directly describe the mapping of the ML model over the accelerator, which will enable precise calculations of data reuse opportunities across space and time to reduce overall data movement. Next, the PI will develop the MAESTRO analytical cost model framework to estimate reuse, end-to-end performance, and energy over the target hardware. Finally, a set of tools will be developed around MAESTRO to automatically search for and determine the optimal hardware/mapping/model given constraints of runtime, power, energy, or area. The proposed framework will enable iterative innovation and co-design across the ML model, mapping and target hardware, and will therefore be highly valuable for ML model developers, compiler writers and computer architects. MAESTRO will be released and maintained on an open-source license, and the PI will run periodic tutorials to build an active user-base in the research community.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习（ML）已经迅速成为本世纪的基础技术之一。它在我们今天的生活中无处不在，从允许我们解锁智能手机，到为几乎任何人类活动（餐饮，电影，服务等）提供推荐引擎。机器学习的应用预计在未来将变得更具变革性，特别是在医疗保健、自动运输、机器人、农业、教育和太空探索方面。ML模型在计算上是昂贵的，需要大量的内存来存储训练的模型，并且具有严格的运行时间要求。它们无法在通用处理器上高效运行，这导致了ML自定义硬件加速器的爆炸性增长。然而，从这些加速器中获得良好的性能和能效本身就具有挑战性，因为它依赖于三个组件：ML模型本身、硬件参数以及将ML模型中的计算调度到加速器上有限的计算和内存资源上。拟议的研究将开发一个名为MAESTRO的开源软件网络基础设施，可用于在实际构建硬件和部署模型之前，分析确定ML模型在目标硬件平台上的性能和能效。MAESTRO对于学生、研究人员和行业从业者来说非常有用，可以帮助他们学习、设计和部署自定义ML解决方案。该项目还将吸引本科生和高中生，通过包括黑客马拉松和硬件构建在内的外展活动向他们传授ML。将ML计算映射到加速器内的有限计算元素上，并理解需要在内存层次结构中移动的相应数据是一个不小的问题;对模型进行切片和切割的所有可能方式的空间（称为“分层”）是指数复杂的，并且任何映射的好处在ML模型和目标加速器之间都是不同的。为了解决这个问题，PI将首先开发一组以数据为中心的指令，以直接描述ML模型在加速器上的映射，这将实现跨空间和时间的数据重用机会的精确计算，以减少整体数据移动。接下来，PI将开发MAESTRO分析成本模型框架，以评估目标硬件的重用、端到端性能和能耗。最后，将围绕MAESTRO开发一套工具，以自动搜索和确定给定运行时间、功率、能量或面积约束的最佳硬件/映射/模型。拟议的框架将实现ML模型、映射和目标硬件之间的迭代创新和协同设计，因此对ML模型开发人员、编译器编写人员和计算机架构师非常有价值。MAESTRO将在开源许可证上发布和维护，PI将定期运行教程，以在研究社区中建立活跃的用户群。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores

DOI：
10.1109/hpca53966.2022.00065
发表时间：
2021-04
期刊：
2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
影响因子：
0
作者：
Sheng-Chun Kao;T. Krishna
通讯作者：
Sheng-Chun Kao;T. Krishna

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

DiGamma：用于 DNN 加速器硬件映射协同优化的领域感知遗传算法

DOI：
10.23919/date54114.2022.9774568
发表时间：
2022
期刊：
Automation & Test in Europe Conference & Exhibition (DATE
影响因子：
0
作者：
Kao, Sheng-Chun;Pellauer, Michael;Parashar, Angshuman;Krishna, Tushar
通讯作者：
Krishna, Tushar

MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings

MAESTRO：一种以数据为中心的方法，用于了解 DNN 映射的重用、性能和硬件成本

DOI：
10.1109/mm.2020.2985963
发表时间：
2020
期刊：
IEEE Micro
影响因子：
3.6
作者：
Kwon, Hyoukjun;Chatarasi, Prasanth;Sarkar, Vivek;Krishna, Tushar;Pellauer, Michael;Parashar, Angshuman
通讯作者：
Parashar, Angshuman

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

DOI：
10.1109/hpca51647.2021.00016
发表时间：
2020-12
期刊：
2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
影响因子：
0
作者：
Hyoukjun Kwon;Liangzhen Lai;Michael Pellauer;T. Krishna;Yu-hsin Chen;V. Chandra
通讯作者：
Hyoukjun Kwon;Liangzhen Lai;Michael Pellauer;T. Krishna;Yu-hsin Chen;V. Chandra

GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm

DOI：
10.1145/3400302.3415639
发表时间：
2020-11
期刊：
2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
影响因子：
0
作者：
Sheng-Chun Kao;T. Krishna
通讯作者：
Sheng-Chun Kao;T. Krishna

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Tushar Krishna其他文献

Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures

通过特定技术的 NoC 路由器架构弥合异构 3D SoC 中的频率差距

DOI：
10.1145/3394885.3431421
发表时间：
2021
期刊：
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)
影响因子：
0
作者：
J. M. Joseph;L. Bamberg;J. Geonhwa;Ruei-Ting Chien;Rainer Leupers;Alberto García-Oritz;Tushar Krishna;Thilo Pionteck
通讯作者：
Thilo Pionteck

H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations

H3DFact：利用全息感知表示进行因子分解的异构 3D 集成 CIM

DOI：
发表时间：
2024
期刊：
Design, Automation and Test in Europe
影响因子：
0
作者：
Zishen Wan;Che;Mohamed Ibrahim;Hanchen Yang;S. Spetalnick;Tushar Krishna;A. Raychowdhury
通讯作者：
A. Raychowdhury

SDQ: Sparse Decomposed Quantization for LLM Inference

SDQ：LLM 推理的稀疏分解量化

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Geonhwa Jeong;Po;S. Keckler;Tushar Krishna
通讯作者：
Tushar Krishna

Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

非多项式算子的精确低次多项式逼近，用于同态加密中的快速私有推理

DOI：
10.48550/arxiv.2404.03216
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Jianming Tong;Jing Dang;Anupam Golder;Callie Hao;A. Raychowdhury;Tushar Krishna
通讯作者：
Tushar Krishna