OAC Core: SHF: Small: Enabling Rapid Design and Deployment of Deep Learning Models on Hardware Accelerators
OAC 核心:SHF:小型:支持在硬件加速器上快速设计和部署深度学习模型
基本信息
- 批准号:1909900
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-06-01 至 2023-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Machine Learning (ML) has rapidly emerged as one of the foundational technologies of this century. It is pervasive in our lives today, from allowing us to unlock smartphones, to powering recommendation engines for almost any human activities (dinning, movies, services etc). The applications of ML are expected to become even more transformative in the future, especially in healthcare, autonomous transport, robotics, agriculture, education, and space exploration. ML models are computationally expensive, need large amounts of memory to store the trained model, and have strict runtime requirements. They cannot be run efficiently on general-purpose processors, which has led to an explosive growth in custom hardware accelerators for ML. However, getting good performance and energy-efficiency from these accelerators is itself challenging, as it relies on three components: the ML model itself, the hardware parameters, and the scheduling of computations in the ML model onto the limited compute and memory resources on the accelerator. The proposed research will develop an open-source software cyberinfrastructure called MAESTRO that can be used to analytically determine the performance and energy-efficiency of ML models over target hardware platforms, prior to actually building the hardware and deploying the model. MAESTRO will be extremely useful for students, researchers, and industry practitioners alike to learn about, design, and deploy custom ML solutions. The project will also engage undergraduate and high-school students to teach them about ML through outreach activities involving hackathons and hardware building.Mapping ML computations over finite compute elements within an accelerator, and understanding the corresponding data that needs to move across the memory hierarchy is a non-trivial problem; the space of all possible ways of slicing and dicing the model (known as "dataflow") is exponentially complex, and the benefits of any mapping vary across ML models and target accelerator. To address this, the PI will first develop a set of data-centric directives to directly describe the mapping of the ML model over the accelerator, which will enable precise calculations of data reuse opportunities across space and time to reduce overall data movement. Next, the PI will develop the MAESTRO analytical cost model framework to estimate reuse, end-to-end performance, and energy over the target hardware. Finally, a set of tools will be developed around MAESTRO to automatically search for and determine the optimal hardware/mapping/model given constraints of runtime, power, energy, or area. The proposed framework will enable iterative innovation and co-design across the ML model, mapping and target hardware, and will therefore be highly valuable for ML model developers, compiler writers and computer architects. MAESTRO will be released and maintained on an open-source license, and the PI will run periodic tutorials to build an active user-base in the research community.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习(ML)已经迅速成为本世纪的基础技术之一。它在我们今天的生活中无处不在,从允许我们解锁智能手机,到为几乎任何人类活动(餐饮,电影,服务等)提供推荐引擎。机器学习的应用预计在未来将变得更具变革性,特别是在医疗保健、自动运输、机器人、农业、教育和太空探索方面。ML模型在计算上是昂贵的,需要大量的内存来存储训练的模型,并且具有严格的运行时间要求。它们无法在通用处理器上高效运行,这导致了ML自定义硬件加速器的爆炸性增长。然而,从这些加速器中获得良好的性能和能效本身就具有挑战性,因为它依赖于三个组件:ML模型本身、硬件参数以及将ML模型中的计算调度到加速器上有限的计算和内存资源上。拟议的研究将开发一个名为MAESTRO的开源软件网络基础设施,可用于在实际构建硬件和部署模型之前,分析确定ML模型在目标硬件平台上的性能和能效。MAESTRO对于学生、研究人员和行业从业者来说非常有用,可以帮助他们学习、设计和部署自定义ML解决方案。该项目还将吸引本科生和高中生,通过包括黑客马拉松和硬件构建在内的外展活动向他们传授ML。将ML计算映射到加速器内的有限计算元素上,并理解需要在内存层次结构中移动的相应数据是一个不小的问题;对模型进行切片和切割的所有可能方式的空间(称为“分层”)是指数复杂的,并且任何映射的好处在ML模型和目标加速器之间都是不同的。为了解决这个问题,PI将首先开发一组以数据为中心的指令,以直接描述ML模型在加速器上的映射,这将实现跨空间和时间的数据重用机会的精确计算,以减少整体数据移动。接下来,PI将开发MAESTRO分析成本模型框架,以评估目标硬件的重用、端到端性能和能耗。最后,将围绕MAESTRO开发一套工具,以自动搜索和确定给定运行时间、功率、能量或面积约束的最佳硬件/映射/模型。 拟议的框架将实现ML模型、映射和目标硬件之间的迭代创新和协同设计,因此对ML模型开发人员、编译器编写人员和计算机架构师非常有价值。MAESTRO将在开源许可证上发布和维护,PI将定期运行教程,以在研究社区中建立活跃的用户群。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores
- DOI:10.1109/hpca53966.2022.00065
- 发表时间:2021-04
- 期刊:
- 影响因子:0
- 作者:Sheng-Chun Kao;T. Krishna
- 通讯作者:Sheng-Chun Kao;T. Krishna
DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators
DiGamma:用于 DNN 加速器硬件映射协同优化的领域感知遗传算法
- DOI:10.23919/date54114.2022.9774568
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Kao, Sheng-Chun;Pellauer, Michael;Parashar, Angshuman;Krishna, Tushar
- 通讯作者:Krishna, Tushar
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings
MAESTRO:一种以数据为中心的方法,用于了解 DNN 映射的重用、性能和硬件成本
- DOI:10.1109/mm.2020.2985963
- 发表时间:2020
- 期刊:
- 影响因子:3.6
- 作者:Kwon, Hyoukjun;Chatarasi, Prasanth;Sarkar, Vivek;Krishna, Tushar;Pellauer, Michael;Parashar, Angshuman
- 通讯作者:Parashar, Angshuman
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
- DOI:10.1109/hpca51647.2021.00016
- 发表时间:2020-12
- 期刊:
- 影响因子:0
- 作者:Hyoukjun Kwon;Liangzhen Lai;Michael Pellauer;T. Krishna;Yu-hsin Chen;V. Chandra
- 通讯作者:Hyoukjun Kwon;Liangzhen Lai;Michael Pellauer;T. Krishna;Yu-hsin Chen;V. Chandra
GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm
- DOI:10.1145/3400302.3415639
- 发表时间:2020-11
- 期刊:
- 影响因子:0
- 作者:Sheng-Chun Kao;T. Krishna
- 通讯作者:Sheng-Chun Kao;T. Krishna
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tushar Krishna其他文献
Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures
通过特定技术的 NoC 路由器架构弥合异构 3D SoC 中的频率差距
- DOI:
10.1145/3394885.3431421 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
J. M. Joseph;L. Bamberg;J. Geonhwa;Ruei-Ting Chien;Rainer Leupers;Alberto García-Oritz;Tushar Krishna;Thilo Pionteck - 通讯作者:
Thilo Pionteck
H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations
H3DFact:利用全息感知表示进行因子分解的异构 3D 集成 CIM
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Zishen Wan;Che;Mohamed Ibrahim;Hanchen Yang;S. Spetalnick;Tushar Krishna;A. Raychowdhury - 通讯作者:
A. Raychowdhury
SDQ: Sparse Decomposed Quantization for LLM Inference
SDQ:LLM 推理的稀疏分解量化
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Geonhwa Jeong;Po;S. Keckler;Tushar Krishna - 通讯作者:
Tushar Krishna
Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption
非多项式算子的精确低次多项式逼近,用于同态加密中的快速私有推理
- DOI:
10.48550/arxiv.2404.03216 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jianming Tong;Jing Dang;Anupam Golder;Callie Hao;A. Raychowdhury;Tushar Krishna - 通讯作者:
Tushar Krishna
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
FRED:DNN 模型晶圆级分布式训练的灵活 REduction-Distribution 互连和通信实现
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Saeed Rashidi;William Won;S. Srinivasan;Puneet Gupta;Tushar Krishna - 通讯作者:
Tushar Krishna
Tushar Krishna的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Tushar Krishna', 18)}}的其他基金
Collaborative Research: Frameworks: Advancing Computer Hardware and Systems' Research Capability, Reproducibility, and Sustainability with the gem5 Simulator Ecosystem
协作研究:框架:利用 gem5 模拟器生态系统提升计算机硬件和系统的研究能力、可重复性和可持续性
- 批准号:
2311892 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Student Travel Support for the 2018 Parallel Architectures and Compilation Techniques (PACT-18) Conference
2018 年并行架构和编译技术 (PACT-18) 会议的学生差旅支持
- 批准号:
1842928 - 财政年份:2018
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CRII: SHF: Enabling Neuroevolution in Hardware
CRII:SHF:在硬件中实现神经进化
- 批准号:
1755876 - 财政年份:2018
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Student Travel Support for the 2017 International Symposium on Computer Architecture (ISCA-44)
2017 年计算机体系结构国际研讨会 (ISCA-44) 的学生旅行支持
- 批准号:
1738358 - 财政年份:2017
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
相似国自然基金
胆固醇羟化酶CH25H非酶活依赖性促进乙型肝炎病毒蛋白Core及Pre-core降解的分子机制研究
- 批准号:82371765
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
锕系元素5f-in-core的GTH赝势和基组的开发
- 批准号:22303037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于合成致死策略搭建Core-matched前药共组装体克服肿瘤耐药的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:
鼠伤寒沙门氏菌LPS core经由CD209/SphK1促进树突状细胞迁移加重炎症性肠病的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于外泌体精准调控的“核-壳”(core-shell)同步血管化骨组织工程策略的应用与机制探讨
- 批准号:
- 批准年份:2020
- 资助金额:55 万元
- 项目类别:
肌营养不良蛋白聚糖Core M3型甘露糖肽的精确制备及功能探索
- 批准号:92053110
- 批准年份:2020
- 资助金额:70.0 万元
- 项目类别:重大研究计划
Core-1-O型聚糖黏蛋白缺陷诱导胃炎发生并介导慢性胃炎向胃癌转化的分子机制研究
- 批准号:81902805
- 批准年份:2019
- 资助金额:20.5 万元
- 项目类别:青年科学基金项目
原始地球增生晚期的Core-merging大碰撞事件:地核增生、核幔平衡与核幔边界结构的新认识
- 批准号:41973063
- 批准年份:2019
- 资助金额:65.0 万元
- 项目类别:面上项目
CORDEX-CORE区域气候模拟与预估研讨会
- 批准号:41981240365
- 批准年份:2019
- 资助金额:1.5 万元
- 项目类别:国际(地区)合作与交流项目
RBM38通过协助Pol-ε结合、招募core调控HBV复制
- 批准号:31900138
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
SHF: Core: Small: Real-time and Energy-Efficient Machine Learning for Robotics Applications
SHF:核心:小型:用于机器人应用的实时且节能的机器学习
- 批准号:
2341183 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2333899 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CCF: SHF: CORE: Small: Towards Systematic Quality Control of Physically Unclonable Functions (PUFs)
CCF:SHF:CORE:小型:迈向物理不可克隆功能(PUF)的系统质量控制
- 批准号:
2244479 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Core: Medium: Program Synthesis for Schema Changes
协作研究:SHF:核心:媒介:模式更改的程序综合
- 批准号:
2210831 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Core: Medium: Program Synthesis for Schema Changes
协作研究:SHF:核心:媒介:模式更改的程序综合
- 批准号:
2210832 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF: Core: Small: Real-time and Energy-Efficient Machine Learning for Robotics Applications
SHF:核心:小型:用于机器人应用的实时且节能的机器学习
- 批准号:
2128036 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
SHF CORE: Small: Hybrid NLP and Formal Techniques for Synthesizing Assertions and Identifying Ambiguities from English
SHF CORE:小型:用于综合断言和识别英语歧义的混合 NLP 和形式化技术
- 批准号:
2101021 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Core: Medium: Causal Performance Debugging for Highly-Configurable Systems
协作研究:SHF:核心:中:高度可配置系统的因果性能调试
- 批准号:
2106853 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CISE Core: CCF: SHF: Small: Future-Proof Test Corpus Synthesis for Evolving Software
CISE 核心:CCF:SHF:小型:面向发展软件的面向未来的测试语料库合成
- 批准号:
2120955 - 财政年份:2021
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
- 批准号:
2007775 - 财政年份:2020
- 资助金额:
$ 50万 - 项目类别:
Standard Grant