权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Compiler Optimizations for RTM-based computing systems

基于 RTM 的计算系统的编译器优化

基本信息

批准号：
450944241
负责人：
Professor Dr.-Ing. Jeronimo Castrillon
金额：
--
依托单位：
Institut für Technische Informatik
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/450944241?language=en
关键词：
Compiler Optimizations RTM based computing

项目摘要

Computing systems are undergoing an incredible evolution since the end of Denard scaling and in the face of the current limitations of CMOS technologies. In addition to new computing paradigms, several new memory technologies are being proposed to replace or augment traditional random access memories (RAM). Among them, racetrack memories (RTMs) are an exciting non-volatile memory technology that promises the density of hard-disk drives with the a latency somewhere between static (SRAM) and dynamic RAM (DRAM). A fundamental difference of RTMs is that they store multiple bits sequentially per access transistor, as opposed to one bit in SRAM and DRAM. This makes the latency and energy needed to access data dependent on where the bits are located in the sequential bit stream, creating a new kind of spatial locality where the distance between memory offsets must be minimized to improve performance and save energy. While compilers have targeted temporal and spatial locality in the classical sense, there is not established theory or algorithms to handle the sequential nature of RTMs. This project proposes novel compiler analysis and optimizations for RTM-based computing systems, focusing on the concrete case of nested loop programs from the domains of linear algebra, machine learning and physics simulations. We propose extensions to polyhedral compilers to analyze profitable memory access patterns and transform the program by changing the data layout and the operation schedule. The main goal of these transformations is to produce a semantic-preserving memory access trace where the distances between consecutive accesses are minimized. We then leverage the higher-level semantics in domain-specific languages (DSLs) for tensor expressions, which nicely map to nested loop programs. DSLs offer more degrees of freedom for optimization, since the data layout can be more freely chosen and known algebraic properties of operators enable coarser-grained transformations. Optimizations in this project will target not only performance and energy consumption, but also the interesting trade-off between these standard metrics and capacity offered by RTMs. We expect this project to lay the groundwork for future compilers for RTM-based systems and and provide valuable system-level feedback to computer architects and perhaps material scientists.

自Denard Scaling结束以来，面对当前CMOS技术的局限性，计算系统正在经历一场令人难以置信的演变。除了新的计算模式外，还提出了几种新的存储器技术来取代或增强传统的随机存取存储器(RAM)。其中，赛道存储器(RTM)是一种激动人心的非易失性存储器技术，它承诺了硬盘驱动器的密度，延迟介于静态(SRAM)和动态RAM(DRAM)之间。RTM的一个根本区别在于，与SRAM和DRAM中的一位不同，它们在每个存取晶体管上顺序存储多个位。这使得访问数据所需的延迟和能量取决于位在顺序比特流中的位置，从而创建了一种新的空间局部性，其中必须最小化内存偏移量之间的距离以提高性能和节省能源。虽然编译器以经典意义上的时间和空间局部性为目标，但还没有确定的理论或算法来处理RTM的顺序性质。该项目针对基于RTM的计算系统提出了新的编译器分析和优化方案，重点针对线性代数、机器学习和物理模拟领域中的嵌套循环程序的具体情况。我们提出了对多面体编译器的扩展，以分析有利可图的内存访问模式，并通过改变数据布局和操作调度来转换程序。这些转换的主要目标是产生保持语义的存储器访问轨迹，其中连续访问之间的距离被最小化。然后，我们将领域特定语言(DSL)中的高级语义用于张量表达式，这很好地映射到嵌套循环程序。DSL为优化提供了更多自由度，因为可以更自由地选择数据布局，并且运算符的已知代数属性支持更粗粒度的转换。此项目中的优化不仅针对性能和能源消耗，而且还针对这些标准指标和RTMS提供的容量之间的有趣权衡。我们期望这个项目为未来基于RTM的系统的编译器奠定基础，并为计算机架构师，也许还有材料科学家提供有价值的系统级反馈。