权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

FoMR: DeepFetch: Compact Deep Learning based Prefetcher on Configurable Hardware

FoMR：DeepFetch：可配置硬件上基于紧凑深度学习的预取器

基本信息

批准号：
1912680
负责人：
Viktor Prasanna
金额：
$ 20万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2022-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1912680&HistoricalAwards=false
关键词：
FoMR DeepFetch Compact Deep Learning

项目摘要

Fast computer processors, tensor processing units, hardware accelerators, and heterogeneous architectures have enabled large-scale speed-ups in computational power, but memory speeds have not kept pace at the same time. Memory performance therefore has become the bottleneck in many applications that rely on heavy memory access. Several emerging memory technologies such 3D-Stacked Dynamic Random Access Memory (3D-DRAM) and non-volatile memory attempt to address memory bottleneck issues from a hardware perspective, but with a tradeoff among bandwidth, power, latency, and cost. Rather than redesigning existing algorithms to suit specific memory technology, this project will develop a Machine Learning-based approach that automatically learns access patterns which may be used to optimally prefetch data. Specifically, highly compact Long short-term memory (LSTM) models will be used as the centerpiece of the prefetcher for predicting memory accesses. Through novel model compression techniques, hierarchical memory modeling and dedicated hardware, this project will overcome barriers of fully exploiting machine learning and emerging hardware to improve prefetching. Successful completion of this project will lead to improved memory performance for applications, including signal processing, computer vision, and language processing.A practical LSTM based prefetcher implementation on hardware requires dealing with certain challenges that will be addressed in this endeavor: (i) training a small model (to enable fast inference) with large traces that is highly accurate in predicting memory accesses for multiple applications; (ii) model compression to ensure real-time inference; (iii) retraining the model online on-demand to learn application specific models, which would require fast learning with small amount of data; (iv) making prefetching decisions in real-time based on the prediction and uncertainty of the model ''what'', ''when'', and ''where'' to prefetch, which also requires careful modeling of the target memory hierarchy; (vi) based on the predictions, deciding in real-time if reordering data (dynamic data layout) can improve the latency, making future prefetches more effective; (vii) mapping the framework of predictions and decision making on limited available configurable hardware in - ensuring low latency training and high-throughput prefetching utilizing small area/power.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

快速的计算机处理器、张量处理单元、硬件加速器和异构架构已经实现了计算能力的大规模加速，但内存速度却没有同时跟上。因此，存储器性能已成为许多依赖于大量存储器访问的应用程序中的瓶颈。诸如3D堆叠动态随机存取存储器（3D-DRAM）和非易失性存储器的若干新兴存储器技术试图从硬件角度解决存储器瓶颈问题，但是在带宽、功率、延迟和成本之间进行权衡。该项目将开发一种基于机器学习的方法，自动学习可用于最佳预取数据的访问模式，而不是重新设计现有算法以适应特定的内存技术。具体来说，高度紧凑的长短期记忆（LSTM）模型将被用作预取器的核心，用于预测内存访问。通过新的模型压缩技术，分层内存建模和专用硬件，该项目将克服充分利用机器学习和新兴硬件来改善预取的障碍。该项目的成功完成将提高应用程序的内存性能，包括信号处理，计算机视觉和语言处理。基于LSTM的预取器在硬件上的实际实现需要处理将在此奋进中解决的某些挑战：（i）训练一个小模型（ii）模型压缩以确保实时推断;（iv）基于要预取的模型“什么”、“何时”和“何处”的预测和不确定性来实时地做出预取决策，这也需要对目标存储器层级进行仔细建模;（vi）基于所述预测，实时决定是否对数据进行重新排序（动态数据布局）可以改善延迟，使未来的预取更有效;（vii）将预测和决策框架映射到有限的可用可配置硬件上-确保利用小区域/该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

RAOP: Recurrent Neural Network Augmented Offset Prefetcher

DOI：
10.1145/3422575.3422807
发表时间：
2020-09
期刊：
Proceedings of the International Symposium on Memory Systems
影响因子：
0
作者：
Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna
通讯作者：
Pengmiao Zhang;Ajitesh Srivastava;Benjamin Brooks;R. Kannan;V. Prasanna

SHARP: Software Hint-Assisted Memory Access Prediction for Graph Analytics

DOI：
10.1109/hpec55821.2022.9926307
发表时间：
2022-09
期刊：
2022 IEEE High Performance Extreme Computing Conference (HPEC)
影响因子：
0
作者：
Pengmiao Zhang;R. Kannan;Xiangzhi Tong;Anant V. Nori;V. Prasanna
通讯作者：
Pengmiao Zhang;R. Kannan;Xiangzhi Tong;Anant V. Nori;V. Prasanna

ReSemble: reinforced ensemble framework for data prefetching

ReSemble：用于数据预取的增强型集成框架

DOI：
发表时间：
2022
期刊：
Storage and Analysis
影响因子：
0
作者：
Zhang, Pengmiao;Kannan, Rajgopal;Srivastava, Ajitesh;Nori, Anant V.;Prasanna, Viktor K.
通讯作者：
Prasanna, Viktor K.

TransforMAP: Transformer for Memory Access Prediction

TransforMAP：用于内存访问预测的变压器

DOI：
发表时间：
2021
期刊：
International Symposium on Computer Architecture
影响因子：
0
作者：
Zhang, Pengmiao;Srivastava, Ajitesh;Kannan, Rajgopal;Nori, Anant V.;Prasanna, Viktor K.
通讯作者：
Prasanna, Viktor K.

MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction