We need better tools for C, such as source browsers, bug finders, and automated refactorings. The problem is that large C systems such as Linux are software product lines, containing thousands of configuration variables controlling every aspect of the software from architecture features to file systems and drivers. The challenge of such configurability is how do software tools accurately analyze all configurations of the source without the exponential explosion of trying them all separately. To this end, we focus on two key subproblems, parsing and the build system. The contributions of this thesis are the following: (1) a configuration-preserving preprocessor and parser called SuperC that preserves configurations in its output syntax tree; (2) a configuration-preserving Makefile evaluator called Kmax that collects Linux’s compilation units and their configurations; and (3) a framework for configuration-aware analyses of source code using these tools. C tools need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. SuperC is a complete, performant solution to parsing all of C. First, a configuration-preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program’s variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configurationpreserving parser generates a well-formed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate SuperC’s effectiveness on the x86 Linux kernel. Large-scale C codebases like Linux are software product families, with complex build systems that
我们需要更好的C语言工具,比如源代码浏览器、bug查找器和自动重构。问题在于,大型C系统(如Linux)是软件产品线,包含数千个配置变量,控制软件的各个方面,从体系结构特性到文件系统和驱动程序。这种可配置性的挑战在于,软件工具如何准确地分析源的所有配置,而不会出现单独尝试所有配置的指数爆炸。为此,我们专注于两个关键的子问题,解析和构建系统。本论文的贡献如下:(1)一个配置保持的预处理器和解析器SuperC,它在输出语法树中保留配置;(2)一个配置保持的Makefile求值器Kmax,它收集Linux的编译单元及其配置;(3)一个使用这些工具的源代码配置感知分析框架。C工具需要处理两种语言:C本身和预处理器。后者通过文件包含、宏和静态条件来提高表达能力。但它只对标记进行操作,这使得解析这两种语言都很困难。SuperC是一个完整的,高性能的解决方案来解析所有的C。首先,配置保持预处理器解析包含和宏,但保持静态条件不变,从而保持程序的可变性。为了确保完整性,我们分析了预处理器功能之间的所有交互,并确定了正确处理它们的技术。其次,配置保持解析器生成一个格式良好的AST,其中包含条件的静态选择节点。它在遇到静态条件时分叉新的子解析器,并在条件之后再次合并它们。为了确保性能,我们提出了一个简单的算法表驱动的Fork-Merge LR解析和四个新的优化。我们证明了SuperC的x86 Linux内核上的有效性。像Linux这样的大规模C代码库是软件产品家族,具有复杂的构建系统,