SHF: AF: Small: Algorithms and a Code Generator for Faster Stencil Computations

SHF:AF:Small:用于更快模板计算的算法和代码生成器

基本信息

  • 批准号:
    2318633
  • 负责人:
  • 金额:
    $ 59.3万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-10-01 至 2026-09-30
  • 项目状态:
    未结题

项目摘要

A wide range of application areas across industry and scientific computing use stencil computations, including the simulation of mechanical systems, traffic flows, meteorology, stochastic and fractional differential equations, chemistry, modeling of erosion, fluid dynamics, quantitative finance, physical simulations, and even cellular automata. A stencil defines the value of a grid cell in a spatial grid at any time step t by combining the values of a fixed constant-sized set of neighboring cells at recent time steps earlier than t. Cells near the boundary of a bounded grid are defined by additional conditions, since they lack the neighboring cells necessary for the stencil to use. A stencil computation is one in which a grid filled with initial data is evolved by repeatedly applying a stencil on the grid cells for a given number of time steps. Stencil computations are easy to implement using nested loops and such implementations perform work proportional to N*T for updating a grid of size N for T time steps. Running on multiple processing cores of a multicore machine and/or using cache-efficient techniques to reduce costly accesses to the slow computer memory (random access memory, RAM) during the computation can result in much smaller running times. But the work (a.k.a. computational complexity) which represents the total resource usage (e.g., total energy consumed or the total processing time across all processing cores) still remains proportional to N*T. In contrast, building on the recent prior work by the investigator's research group, this project will improve over many of these state-of-the-art stencil algorithms not only by performing a significantly less amount of work W(N,T) for any given N and T, but also guaranteeing a more desirable property -- the ratio W(N,T)/(N*T) decreases with the increase of N*T. The resulting algorithms will not only run faster than existing algorithms on modern multicore computers, they will significantly reduce the overall resource (e.g., time, energy) usage of many classes of stencil computations. A code generator that can automatically produce highly efficient correct parallel implementations of the new algorithms from simple specifications of stencil computations will be implemented. An autotuner for further automatic performance optimization of those implementations on any target multicore machine will also be built. Thus, this project will bring high-performing resource-saving stencil computations to a wider audience of scientists, students, and educators, without requiring them to fully understand the complicated algorithms or invest time and effort necessary to implement such algorithms on modern computing systems. Findings and research problems from this project will be integrated in the advanced graduate courses. Undergraduate and graduate students will be involved in the research project.The investigator's research group has recently shown polynomial improvements in computational complexity over state-of-the-art stencil algorithms for linear stencils with arbitrary aperiodic boundary conditions using the Fast Fourier Transform (FFT). Stencils that define grid values for the current time step as a linear (resp. nonlinear) function of grid values from earlier time steps are called linear (resp. nonlinear) stencils. Very recently the group has also shown that linear stencil computations under the freespace boundary condition can be well-approximated very efficiently in work and space through a novel use of Gaussian approximations and n-body computations. Implementations of most of these new algorithms run significantly faster than the fastest implementations of existing algorithms. This project will build on those results and explore other novel ideas to tackle the following algorithm design challenges: (1) fast approximation algorithms for linear periodic/aperiodic stencil computations, (2) faster exact algorithms for linear stencils under special boundary conditions (e.g., Dirichlet, Neumann), (3) fast algorithms for classes of nonlinear stencils, and (4) fast algorithms for heterogeneous stencil grids. Designing algorithms with improved computational complexity will be the core challenge and objective across all classes of stencils. Further improvements will come from reducing the span and cache (or input/output, I/O) complexity. If there cannot exist algorithms with improved bounds (work, span, or I/O) for certain classes, the aim will be to prove asymptotically tight lower bounds. In many time-sensitive simulations (e.g., weather forecasting) fast algorithms with approximately correct results are more desirable than slow algorithms with exact results. Hence, fast approximation algorithms will be designed for potential use in such cases. A specialized software stack named "Saltar" will be built to generate efficient multicore implementations of the algorithms to be designed in this project. The Saltar domain-specific language (DSL) will allow users to write down simple specifications of the stencil problems they would like to solve, and the Saltar code generator will take the specification to generate highly optimized multithreaded OpenCilk/OpenMP code. The Saltar autotuner will be built for automatic offline tuning of the parallel recursive divide-and-conquer implementations generated by the code generator. Saltar will be released under an open-source license (BSD 3-clause or similar permissive licensing allowing use by industry). A public web interface will be built to evaluate the success of the project and the usability of Saltar, and a benchmark suite will be created by collecting stencil problems from various sources including potentially from the web users (with permission).This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
工业和科学计算的广泛应用领域使用模板计算,包括机械系统的模拟,交通流量,气象学,随机和分数微分方程,化学,侵蚀建模,流体动力学,定量金融,物理模拟,甚至元胞自动机。模板通过组合在比t更早的最近时间步长处的固定恒定大小的相邻单元集合的值来定义空间网格中的网格单元在任何时间步长t处的值。有界网格边界附近的单元格由附加条件定义,因为它们缺少模具使用所需的相邻单元格。模板计算是其中填充有初始数据的网格通过在给定数量的时间步长内在网格单元上重复应用模板来演化的计算。使用嵌套循环很容易实现Stenointment计算,并且这样的实现执行与N*T成比例的工作,用于更新T个时间步长的大小为N的网格。在多核机器的多个处理核上运行和/或使用高速缓存高效技术来减少计算期间对慢速计算机存储器(随机存取存储器,RAM)的高成本访问可以导致小得多的运行时间。工作(A.K.A.)计算复杂度),其表示总的资源使用(例如,所消耗的总能量或所有处理核上的总处理时间)仍然保持与N*T成比例。相比之下,在研究人员的研究小组最近先前工作的基础上,该项目将改进许多这些最先进的模板算法,不仅通过对任何给定的N和T执行显著更少的工作量W(N,T),而且还保证了更理想的属性-比值W(N,T)/(N*T)随着N*T的增加而减小。所得到的算法不仅将比现代多核计算机上的现有算法运行得更快,它们还将显著减少总体资源(例如,时间、能量)使用许多种类的模板计算。一个代码生成器,可以自动产生高效率的正确的并行实现的新算法,从简单的规格模板计算将实施。还将构建一个自动调谐器,用于在任何目标多核机器上进一步自动优化这些实现。因此,该项目将为更广泛的科学家,学生和教育工作者带来高性能的资源节约模板计算,而不需要他们完全理解复杂的算法或投入必要的时间和精力在现代计算系统上实现这些算法。该项目的发现和研究问题将纳入高级研究生课程。本科生和研究生将参与该研究项目。研究人员的研究小组最近显示,使用快速傅里叶变换(FFT),在计算复杂性方面,与最先进的模板算法相比,线性模板算法具有任意非周期边界条件。将当前时间步的网格值定义为线性(或来自较早时间步的网格值的非线性)函数被称为线性(或非线性)拉伸。最近,该小组还表明,在自由空间边界条件下的线性模板计算可以很好地近似非常有效地在工作和空间,通过一种新的使用高斯近似和n体计算。这些新算法中的大多数的实现运行速度明显快于现有算法的最快实现。这个项目将建立在这些结果的基础上,并探索其他新的想法,以解决以下算法设计挑战:(1)线性周期/非周期模板计算的快速近似算法,(2)特殊边界条件下线性模板计算的快速精确算法(例如,Dirichlet,Neumann),(3)快速算法的类的非线性stenetrance,和(4)快速算法的异构模板网格。设计具有改进的计算复杂性的算法将是所有类别的支架的核心挑战和目标。进一步的改进将来自减少跨度和缓存(或输入/输出,I/O)复杂性。如果不存在对某些类具有改进的边界(工作,跨度或I/O)的算法,那么目标将是证明渐近紧的下界。在许多时间敏感的模拟中(例如,天气预报)具有近似正确结果的快速算法比具有精确结果的慢速算法更理想。因此,快速近似算法将被设计用于这种情况下的潜在用途。一个专门的软件栈命名为“Saltar”将生成高效的多核实现的算法设计在这个项目中。Saltar领域特定语言(DSL)将允许用户写下他们想要解决的模板问题的简单规范,Saltar代码生成器将采用该规范来生成高度优化的多线程OpenCilk/OpenMP代码。Saltar autotuner将用于自动离线调整代码生成器生成的并行递归分治实现。Saltar将在开源许可证(BSD 3条款或类似的允许行业使用的许可证)下发布。将建立一个公共网络界面来评估该项目的成功和Saltar的可用性,并将通过从各种来源收集模板问题来创建一个基准套件,包括潜在的网络用户(经许可)。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rezaul Chowdhury其他文献

Rezaul Chowdhury的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rezaul Chowdhury', 18)}}的其他基金

CAREER: A Unified Framework for Designing Efficient Resource-Oblivious Parallel Algorithms
职业:设计高效的资源无关并行算法的统一框架
  • 批准号:
    1553510
  • 财政年份:
    2016
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Continuing Grant
SHF: AF: Medium: Collaborative Research: The Pochoir Stencil Compiler
SHF:AF:媒介:协作研究:Pochoir 模板编译器
  • 批准号:
    1162196
  • 财政年份:
    2012
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于前瞻性队列的双酚AF联合果糖加重代谢损伤的靶向代谢组学研究
  • 批准号:
    2025JJ30049
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
U2AF2-circMMP1信号轴促进结直肠癌进展的分子机制研究
  • 批准号:
    2025JJ80723
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
U2AF2精氯酸甲基化调控RNA转录合成在MTAP缺失骨肉瘤T细胞耗竭中的机制研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0 万元
  • 项目类别:
    青年科学基金项目
BDA-366通过MYD88/NF-κB/PGC1β通路杀伤 KMT2A/AF9 AML细胞的机制研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    15.0 万元
  • 项目类别:
    省市级项目
Lu AF21934减少缺血性脑卒中导致的神经损伤的机制研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
H2S介导剪接因子BraU2AF65a的S-巯基化修饰促进大白菜开花的分子机制
  • 批准号:
    32372727
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
AF9通过ARRB2-MRGPRB2介导肠固有肥大细胞活化促进重症急性胰腺炎发生MOF的研究
  • 批准号:
    82300739
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
剪接因子U2AF1突变在急性髓系白血病原发耐药中的机制研究
  • 批准号:
    82370157
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
线粒体活性氧介导的胎盘早衰在孕期双酚AF暴露致婴幼儿神经发育迟缓中的作用
  • 批准号:
    82304160
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
U2AF2-circMMP1调控能量代谢促进结直肠癌肝转移的分子机制
  • 批准号:
    82303789
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

AF: Small: Problems in Algorithmic Game Theory for Online Markets
AF:小:在线市场的算法博弈论问题
  • 批准号:
    2332922
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: New Directions in Algorithmic Replicability
合作研究:AF:小:算法可复制性的新方向
  • 批准号:
    2342244
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Exploring the Frontiers of Adversarial Robustness
合作研究:AF:小型:探索对抗鲁棒性的前沿
  • 批准号:
    2335411
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
NSF-BSF: Collaborative Research: AF: Small: Algorithmic Performance through History Independence
NSF-BSF:协作研究:AF:小型:通过历史独立性实现算法性能
  • 批准号:
    2420942
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Structural Graph Algorithms via General Frameworks
合作研究:AF:小型:通过通用框架的结构图算法
  • 批准号:
    2347322
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Real Solutions of Polynomial Systems
合作研究:AF:小:多项式系统的实数解
  • 批准号:
    2331401
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
AF: Small: Verification Complexities of Self-Assembly Systems
AF:小:自组装系统的验证复杂性
  • 批准号:
    2329918
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Real Solutions of Polynomial Systems
合作研究:AF:小:多项式系统的实数解
  • 批准号:
    2331400
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: New Connections between Optimization and Property Testing
合作研究:AF:小型:优化和性能测试之间的新联系
  • 批准号:
    2402572
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
AF: SMALL: Submodular Functions and Hypergraphs: Partitioning and Connectivity
AF:SMALL:子模函数和超图:分区和连接
  • 批准号:
    2402667
  • 财政年份:
    2024
  • 资助金额:
    $ 59.3万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了