权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

NSF-NSERC: SaTC: CORE: Small: Managing Risks of AI-generated Code in the Software Supply Chain

NSF-NSERC：SaTC：核心：小型：管理软件供应链中人工智能生成代码的风险

基本信息

批准号：
2341206
负责人：
Rachel Greenstadt
金额：
$ 60万
依托单位：
New York University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-06-01 至 2027-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2341206&HistoricalAwards=false
关键词：
NSF NSERC SaTC CORE Small

项目摘要

Modern software is created by combining pre-existing software packages into a software product. This approach is enabled by the growing popularity of the Open-Source paradigm, where the source code of software packages is made available under licenses that allow reuse. This approach speeds up software development with significant economic benefits, but also creates the risk of inadvertently importing vulnerable code into critical software tools. The risk is further compounded by the increasing use of Artificial Intelligence (AI) tools for code generation in Open-Source development. These tools must be trained on enormous amounts of data, which is not always rigorously reviewed, and thus they may learn to generate vulnerable code. To make matters worse, malicious parties may actively inject malicious code in their training set. Unfortunately, all these issues are still poorly understood. This project aims at measuring and mitigating the risks emerging from AI-generated code in the software supply chain. It will investigate how prevalent the use of AI tools is, and characterize the security risks they entail. In doing so, it will address pressing economic and societal needs: AI promises to bring significant benefits to software development, but those can only be achieved if its risks are mitigated. The research outcomes will be disseminated through workshops and hackathons, and the results will become part of curriculum and courses. The work will benefit the open-source community by producing provenance tools to improve software supply chain security. The project is a collaboration with researchers from Canada with complementary expertise that provides additional resources to the project. Technically, the AI tools being investigated consist of various Large Language Models (LLM) for code generation. The threat model of interest is one where a developer inserts vulnerable LLM-generated code into a security-critical program, be it due to low-quality code generation or using a poisoned/backdoored LLM. This project consists of three thrusts, each addressing a research question relevant to the threat model: (i) how, and to what extent, LLM code can be distinguished from code written by humans; (ii) to what extent LLM code is already present in the supply chain, and what are its security implications; and (iii) to what extent poisoning attacks against LLM code generation can succeed in realistic conditions. In thrust (i), this project extends existing code stylometry techniques, until now used to distinguish human programmers, to the novel problem of distinguishing human- and LLM-generated code. In thrust (ii), the investigators conduct measurement studies of Open-Source software, generating empirical understanding of the presence and implications of LLM-generated code in the supply chain. Finally, thrust (iii) looks at the practical feasibility of code backdoors, and the effectiveness of automated reputation-based vetting as a defense.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现代软件是通过将预先存在的软件包组合成软件产品而创建的。这种方法是由开放源代码范式的日益普及所促成的，在开放源代码范式中，软件包的源代码在允许重用的许可证下可用。这种方法加快了软件开发，带来了巨大的经济效益，但也带来了无意中将易受攻击的代码导入关键软件工具的风险。在开源开发中越来越多地使用人工智能（AI）工具进行代码生成，进一步加剧了风险。这些工具必须接受大量数据的训练，而这些数据并不总是经过严格的审查，因此它们可能会学会生成易受攻击的代码。更糟糕的是，恶意方可能会主动在其训练集中注入恶意代码。不幸的是，所有这些问题仍然知之甚少。该项目旨在衡量和减轻软件供应链中AI生成代码所带来的风险。它将调查人工智能工具的使用有多普遍，并描述它们所带来的安全风险。通过这样做，它将解决紧迫的经济和社会需求：人工智能有望为软件开发带来重大利益，但只有在降低其风险的情况下才能实现这些目标。研究成果将通过研讨会和黑客马拉松进行传播，其结果将成为课程和课程的一部分。这项工作将通过生产出处工具来改善软件供应链安全，从而使开源社区受益。该项目是与加拿大研究人员的合作，他们具有互补的专门知识，为该项目提供了额外的资源。从技术上讲，正在研究的AI工具由各种用于代码生成的大型语言模型（LLM）组成。感兴趣的威胁模型是开发人员将易受攻击的LLM生成的代码插入到安全关键程序中的模型，无论是由于低质量的代码生成还是使用中毒/后门LLM。该项目包括三个方面，每个方面都解决了与威胁模型相关的研究问题：（i）如何以及在多大程度上区分LLM代码与人类编写的代码;（ii）LLM代码在供应链中已经存在的程度，以及其安全影响;以及（iii）在现实条件下，针对LLM代码生成的中毒攻击在多大程度上可以成功。在主旨（i）中，该项目扩展了现有的代码样式测量技术，直到现在还用于区分人类程序员，以区分人类和LLM生成的代码。在推力（ii）中，调查人员对开源软件进行测量研究，对LLM生成的代码在供应链中的存在和影响进行实证理解。最后，thrust（iii）着眼于代码后门的实际可行性，以及基于声誉的自动审查作为防御的有效性。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Rachel Greenstadt其他文献

Challenges in Restructuring Community-based Moderation

重组基于社区的审核面临的挑战

DOI：
10.48550/arxiv.2402.17880
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Chau Tran;Kejsi Take;Kaylea Champion;Benjamin Mako Hill;Rachel Greenstadt
通讯作者：
Rachel Greenstadt

From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions

从用户洞察到可操作的指标：以用户为中心的隐私保护浏览器扩展评估

DOI：
发表时间：
2024
期刊：
ACM Asia Conference on Computer and Communications Security
影响因子：
0
作者：
Ritik Roongta;Rachel Greenstadt
通讯作者：
Rachel Greenstadt

Stoking the Flames: Understanding Escalation in an Online Harassment Community

煽风点火：了解在线骚扰社区的升级

DOI：
10.1145/3641015
发表时间：
2024
期刊：
Proceedings of the ACM on Human-Computer Interaction
影响因子：
0
作者：
Kejsi Take;Victoria Zhong;Chris Geeng;Emmi Bevensee;Damon McCoy;Rachel Greenstadt
通讯作者：
Rachel Greenstadt