权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Automatic Verification of GPU Kernels

GPU 内核的可扩展自动验证

基本信息

批准号：
EP/K011499/1
负责人：
Alastair Donaldson
金额：
$ 12.75万
依托单位：
Imperial College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2013
资助国家：
英国
起止时间：
2013 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FK011499%2F1
关键词：
Scalable Automatic Verification GPU Kernels

项目摘要

Until relatively recently the processing speed of computer systems increased at an exponential rate. Each year it was possible to use computers to solve computational problems that the previous year were out of reach. Around 2005, physical limits stopped this trend: it became infeasible to increase the clock rate of a processor without consuming an exorbitant amount of energy. To counter this, processor manufacturers have since aimed to provide increased performance by designing "multicore" processors, which consist of two or more processing units on a chip. Many computational tasks are parallelisable, in which case they can be distributed across the cores of a multicore processor.Recently there has been a trend towards "many-core" processors, with hundreds or thousands of processing elements. For highly parallel applications, many-core processors can offer a massive speedup. The most readily available many-core processors are graphics processing units (GPUs) from companies such as NVIDIA, AMD, Intel, ARM and Imagination Technologies. GPUs were originally designed to accelerate graphical computations, but have become sufficiently general purpose for accelerating computational tasks in a variety of domains, including financial analysis, medical imaging, media processing and simulation.GPUs are programmed by writing a "kernel" function, a program which will be executed by hundreds or thousands of threads running in parallel on the GPU. Parallel threads can communicate using shared memory, and synchronise using barrier statements. Writing correct GPU kernels is more challenging than writing sequential software due to two main problems: data races and barrier divergence. A data race occurs when two GPU threads access a shared memory location, at least one of the threads modifies this location, and no barrier statement separates the accesses. Data races almost always signify bugs in the kernel and lead to nondeterministic behaviour. Barrier divergence occurs when distinct threads reach different barrier statements in the kernel, and leads to deadlock.Because GPUs are becoming widely used for general purpose software development, there is an urgent need for analysis techniques to help GPU programmers write correct code. Techniques to analyse GPU kernels with respect to data races and barrier divergence would significantly speed up the GPU software development process, leading to shorter time-to-market for GPU-accelerated applications.In this project we plan to design formal techniques for verifying race- and divergence-freedom for GPU kernels. To be adopted and trusted by industrial practitioners our techniques must be highly automatic, scalable, and based on rigorous semantic foundations.We plan to achieve these aims by developing a rigorous GPU memory model specification, and a formal semantics for GPU kernel execution that makes no assumptions about the structure of the kernel to be analysed. Based on these semantic foundations, we will design a verification technique that aims to prove absence of data races and barrier divergence by generating a "contract" for the kernel: a machine-checkable proof that kernel execution cannot lead to these defects. Contract-based verification is modular - each kernel procedure is analysed separately - and thus scalable. We will design a template-based contract generation method that captures domain-specific knowledge about common GPU programming idioms. This will allow efficient verification of GPU kernels that use typical data access patterns. For more intricate kernels that implement highly optimised algorithms, we will design a method based on Craig interpolation. This method will construct a proof of race- and divergence-freedom up to a bounded execution depth, and then attempt to extract a general contract from this proof.Throughout, we will evaluate our methods using open source and industrial GPU kernels, including kernels provided by our industrial collaborators.

直到最近，计算机系统的处理速度仍呈指数级增长。每年都有可能使用计算机来解决前一年无法解决的计算问题。 2005 年左右，物理限制阻止了这一趋势：在不消耗大量能源的情况下提高处理器的时钟频率变得不可行。为了解决这个问题，处理器制造商一直致力于通过设计“多核”处理器来提供更高的性能，“多核”处理器在芯片上包含两个或更多处理单元。许多计算任务是可并行的，在这种情况下，它们可以分布在多核处理器的核心上。最近出现了“多核”处理器的趋势，具有数百或数千个处理元件。对于高度并行的应用程序，多核处理器可以提供巨大的加速。最容易获得的众核处理器是来自 NVIDIA、AMD、Intel、ARM 和 Imagination Technologies 等公司的图形处理单元 (GPU)。 GPU 最初设计用于加速图形计算，但现在已经变得足够通用，可用于加速各种领域的计算任务，包括金融分析、医学成像、媒体处理和模拟。GPU 是通过编写“内核”函数进行编程的，该函数将由在 GPU 上并行运行的数百或数千个线程执行。并行线程可以使用共享内存进行通信，并使用屏障语句进行同步。由于两个主要问题，编写正确的 GPU 内核比编写顺序软件更具挑战性：数据竞争和屏障发散。当两个 GPU 线程访问共享内存位置，至少其中一个线程修改该位置，并且没有屏障语句分隔访问时，就会发生数据竞争。数据竞争几乎总是意味着内核中的错误并导致不确定的行为。当不同的线程到达内核中的不同屏障语句时，就会发生屏障发散，并导致死锁。由于 GPU 越来越广泛地用于通用软件开发，因此迫切需要分析技术来帮助 GPU 程序员编写正确的代码。分析 GPU 内核数据竞争和障碍发散的技术将显着加快 GPU 软件开发过程，从而缩短 GPU 加速应用程序的上市时间。在这个项目中，我们计划设计正式的技术来验证 GPU 内核的竞争和发散自由度。为了被工业从业者采用和信任，我们的技术必须是高度自动化、可扩展的，并且基于严格的语义基础。我们计划通过开发严格的 GPU 内存模型规范和 GPU 内核执行的正式语义来实现这些目标，该语义不对要分析的内核的结构做出任何假设。基于这些语义基础，我们将设计一种验证技术，旨在通过为内核生成“契约”来证明不存在数据争用和屏障发散：一种机器可检查的证明，证明内核执行不会导致这些缺陷。基于合约的验证是模块化的——每个内核程序都被单独分析——因此是可扩展的。我们将设计一种基于模板的合约生成方法，该方法捕获有关常见 GPU 编程习惯的特定领域知识。这将允许对使用典型数据访问模式的 GPU 内核进行有效验证。对于实现高度优化算法的更复杂的内核，我们将设计一种基于克雷格插值的方法。该方法将构建一个达到有限执行深度的竞争自由和分歧自由的证明，然后尝试从该证明中提取一般契约。自始至终，我们将使用开源和工业 GPU 内核（包括我们的工业合作者提供的内核）来评估我们的方法。