权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Proof-Carrying Code Completions

EAGER：携带证明的代码完成

基本信息

批准号：
2403762
负责人：
Caleb Stanford
金额：
$ 30万
依托单位：
University of California-Davis
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-02-15 至 2025-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2403762&HistoricalAwards=false
关键词：
EAGER Proof Carrying Code Completions

项目摘要

Today's programmers are using large language models (LLMs) to accelerate software development by automatically generating code suggestions and code completions. Widely used examples include GitHub Copilot and OpenAI ChatGPT. However, code generated by these tools can have bugs that are not caught by users, and this presents a serious safety risk. This project will leverage an idea called "proof-carrying code" where code suggestions are packaged together with a mathematical proof of their safety, allowing programmers to be confident that the program is safe to deploy. This project will develop tools, techniques, and empirical results for using LLMs to generate trustworthy code together with mathematical proofs. Project outcomes, including code, data sets and course materials, will be developed in the open and made available online to researchers working on LLMs, end users of LLM-based code generation, and early industry and open source adopters.In the 1990s, researchers in the programming languages community recognized a powerful idea known as proof-carrying code (PCC): they showed how code can be shipped together with a proof of its safety that could be vetted – efficiently – by an end user. LLMs can be viewed as high-resource computations, and LLM users as low-resource entities. Seen through this lens, PCC maps naturally to the safety problem for LLM-generated code. The technical aims of this project are divided into four thrusts: (1) Gather empirical data on code that is currently generated by LLMs, and to determine core safety risks, to enable building of a dataset that will be useful to other researchers; (2) Develop a framework for PCC, including enumeration of safety properties of interest and showing how to instantiate the framework with existing program verification, proof languages, and proof frameworks; (3) Implement new tools for verification condition generation from source code for popular programming languages and for specific safety properties; and (4) Evaluate the use of LLMs for generating proofs in this context, including developing new algorithms and proof sampling techniques to improve model effectiveness. The research will lead to new insights into the current capabilities of LLMs, to new relevant safety properties for code generation in a black-box setting, and to new techniques to generate verification conditions -- to bridge the gap in formal verification technology from special-purpose languages like Coq and Dafny to general-purpose programming languages in popular use.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

今天的程序员正在使用大型语言模型（llm）来通过自动生成代码建议和代码完成来加速软件开发。广泛使用的例子包括GitHub Copilot和OpenAI ChatGPT。然而，由这些工具生成的代码可能存在用户无法捕获的错误，这带来了严重的安全风险。这个项目将利用一个称为“携带证明的代码”的想法，其中代码建议与它们的安全性的数学证明打包在一起，允许程序员确信程序是安全部署的。该项目将开发工具、技术和实证结果，用于使用法学硕士生成可信赖的代码以及数学证明。项目成果，包括代码、数据集和课程材料，将以开放的方式开发，并在线提供给从事法学硕士研究的研究人员、基于法学硕士的代码生成的最终用户，以及早期的行业和开源采用者。在20世纪90年代，编程语言社区的研究人员认识到一个强大的思想，即携带证明代码（PCC）：他们展示了如何将代码与可以由最终用户有效审查的安全性证明一起发布。LLM可以被看作是高资源计算，而LLM用户则是低资源实体。从这个角度来看，PCC自然映射到llm生成的代码的安全问题。该项目的技术目标分为四个重点：(1)收集llm当前生成的代码的经验数据，并确定核心安全风险，以便建立对其他研究人员有用的数据集；(2)为PCC开发一个框架，包括列举感兴趣的安全属性，并展示如何用现有的程序验证、证明语言和证明框架实例化该框架；(3)实施新的工具，用于从流行编程语言和特定安全属性的源代码生成验证条件；(4)评估法学硕士在这种情况下生成证明的使用，包括开发新的算法和证明采样技术，以提高模型的有效性。这项研究将为llm的当前能力带来新的见解，为黑箱环境下的代码生成提供新的相关安全属性，并为生成验证条件提供新技术，以弥合正式验证技术从Coq和Dafny等特殊用途语言到流行的通用编程语言之间的差距。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。