权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

DIADEM: debugging made dependable and measurable

DIADEM：调试变得可靠且可衡量

基本信息

批准号：
EP/W012308/1
负责人：
Stephen Kell
金额：
$ 41.39万
依托单位：
King's College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FW012308%2F1
关键词：
DIADEM debugging made dependable measurable

项目摘要

Software quality is increasingly critical to most of humanity. Bugs in software claim a huge annual toll, financially and even in human life. To eliminate bugs, developers depend crucially on their tools. Tools for interactive debugging are vital. They alone provide a high- (source-) level view of a running (binary) program, enabling programmers to 'see the bug' as it occurs in the program running in front of them. However, debugging infrastructure is notoriously unreliable, as it works only if various metadata is complete and correct. If not, the programmer sees a partial or incorrect view, which may be useless or actively misleading.These problems occur often in popular languages (e.g. C, C++, Rust, Go), owing to a tension between debuggability and optimisation. Debugging in these languages works by compiler-generated /metadata/ describing how binary (executable) code relates to source (human-written) code. Metadata generation is 'best-effort', and optimisation frequently introduces flaws -- but simply disabling optimisations is seldom an option. Programmers rely on optimisations to relieve them of much hand-tuning. Without them, code may run tens of times slower. Furthermore, some bugs appear only in optimised code, owing to undefined behaviour (underspecification) in the source language.This problem is extremely challenging. The heart of it is that writing compiler optimisations that preserve debuggability demands extra effort, on what are already intricate code transformations ('passes'). In practice corners are cut, leaving the output metadata approximate. To be acceptable to pass authors, improvements must reshape the effort/reward curve without increasing the task's baseline complexity. Unlike performance, debugging so far lacks quantitative benchmarks, so compiler authors have not prioritised or competed on debuggability.Existing techniques amount to interaction-based testing of debugging, often with features for narrowing down which passes introduced a flaw. This is haphazard, since exploring all metadata includes the already-hard problem of achieving full-coverage tests (to 'drive' the debugger over all program locations). We propose instead to analyse metadata as an artifact in its own right. This means instead of tests that interact with a single concrete execution through a debugger, we must devise a custom systematic, symbolic method for exploring the compiled code, evaluating the correctness of metadata in a mathematical manner. Unlike haphazard testing, this promises systematic measurement of lost coverage and correctness; the latter can (we hypothesise) be automated using recent advances in formal specification of source languages, namely /executable semantics/, as a replacement for the current manual practices. This idea of parallel source- and binary-level exploration also suggests a radical approach: post-hoc synthesis of metadata, relieving the compiler of generating it at all. The idea here builds on successful work on neighbouring problems (translation validation and decompilation).The project will proceed by practical methods, experimenting on a real production compiler (LLVM). It will build novel tools embeddable into existing compiler-testing workflows, both to diagnose compiler bugs and to quantify the improvement from fixing them. It will empirically explore abstractions and helpers used internally in compilers, to devise designs making them measurably more debug-preserving. Finally it will build a novel tool exploring the radical idea of synthesising high-quality metadata in post-hoc fashion, outside the compiler. It will develop metrics allowing quantitative comparison against traditional approaches. The beneficiaries are on many levels: compiler authors, software developers at large, and the general public who use or depend on the affected software.

软件质量对大多数人来说越来越重要。软件中的漏洞每年都会造成巨大的经济损失，甚至会影响到人类的生活。为了消除bug，开发人员非常依赖他们的工具。交互式调试工具至关重要。它们单独提供了一个正在运行的（二进制）程序的高级（源代码）视图，使程序员能够在程序运行时“看到错误”。然而，调试基础设施是众所周知的不可靠，因为它只有在各种元数据完整和正确的情况下才能工作。如果没有，程序员会看到一个不完整或不正确的视图，这可能是无用的或积极误导。这些问题经常发生在流行的语言（例如C，C++，Rust，Go）中，这是由于可调试性和优化之间的紧张关系。在这些语言中，编译器通过编译器生成的/元数据/描述二进制（可执行）代码如何与源代码（人类编写的）代码相关来工作。元数据的生成是“尽最大努力”的，优化经常会引入缺陷--但简单地禁用优化很少是一种选择。程序员依靠优化来减轻他们的手工调整。如果没有它们，代码可能会慢几十倍。此外，由于源语言中未定义的行为（未指定），一些错误只出现在优化的代码中。它的核心是，编写保持可调试性的编译器优化需要额外的努力，在已经复杂的代码转换（“通道”）上。在实践中，会进行切角，使输出元数据近似。为了让通过测试的作者接受，改进必须在不增加任务的基线复杂性的情况下重塑努力/回报曲线。与性能不同，调试到目前为止缺乏定量基准，因此编译器作者没有优先考虑或竞争可调试性。现有技术相当于基于交互的调试测试，通常具有缩小哪些通道引入缺陷的功能。这是偶然的，因为探索所有元数据包括实现全覆盖测试（在所有程序位置上“驱动”调试器）已经很难的问题。相反，我们建议将元数据作为一个工件进行分析。这意味着，我们必须设计一种自定义的系统化、符号化的方法来探索编译后的代码，以数学方式评估元数据的正确性，而不是通过调试器与单个具体执行交互的测试。不像偶然的测试，这承诺系统的测量丢失的覆盖率和正确性;后者可以（我们假设）自动使用最近的进展，在正式规范的源语言，即/可执行语义/，作为当前的手动操作的替代。这种并行源代码和二进制级别探索的想法也提出了一种激进的方法：事后综合元数据，从而使编译器根本不必生成元数据。这里的想法建立在对邻近问题（翻译验证和反编译）的成功工作的基础上。该项目将通过实用的方法进行，在真实的生产编译器（LLVM）上进行实验。它将构建可嵌入现有编译器测试工作流程的新工具，以诊断编译器错误并量化修复它们的改进。它将经验性地探索编译器内部使用的抽象和帮助程序，以设计出更好的调试保护设计。最后，它将构建一个新颖的工具，探索在编译器之外以事后方式合成高质量元数据的激进思想。它将制定衡量标准，以便与传统方法进行定量比较。受益者来自多个层面：编译器作者、广大软件开发人员以及使用或依赖受影响软件的公众。