NSF-NSERC: SaTC: CORE: Small: Managing Risks of AI-generated Code in the Software Supply Chain
NSF-NSERC:SaTC:核心:小型:管理软件供应链中人工智能生成代码的风险
基本信息
- 批准号:2341206
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-06-01 至 2027-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Modern software is created by combining pre-existing software packages into a software product. This approach is enabled by the growing popularity of the Open-Source paradigm, where the source code of software packages is made available under licenses that allow reuse. This approach speeds up software development with significant economic benefits, but also creates the risk of inadvertently importing vulnerable code into critical software tools. The risk is further compounded by the increasing use of Artificial Intelligence (AI) tools for code generation in Open-Source development. These tools must be trained on enormous amounts of data, which is not always rigorously reviewed, and thus they may learn to generate vulnerable code. To make matters worse, malicious parties may actively inject malicious code in their training set. Unfortunately, all these issues are still poorly understood. This project aims at measuring and mitigating the risks emerging from AI-generated code in the software supply chain. It will investigate how prevalent the use of AI tools is, and characterize the security risks they entail. In doing so, it will address pressing economic and societal needs: AI promises to bring significant benefits to software development, but those can only be achieved if its risks are mitigated. The research outcomes will be disseminated through workshops and hackathons, and the results will become part of curriculum and courses. The work will benefit the open-source community by producing provenance tools to improve software supply chain security. The project is a collaboration with researchers from Canada with complementary expertise that provides additional resources to the project. Technically, the AI tools being investigated consist of various Large Language Models (LLM) for code generation. The threat model of interest is one where a developer inserts vulnerable LLM-generated code into a security-critical program, be it due to low-quality code generation or using a poisoned/backdoored LLM. This project consists of three thrusts, each addressing a research question relevant to the threat model: (i) how, and to what extent, LLM code can be distinguished from code written by humans; (ii) to what extent LLM code is already present in the supply chain, and what are its security implications; and (iii) to what extent poisoning attacks against LLM code generation can succeed in realistic conditions. In thrust (i), this project extends existing code stylometry techniques, until now used to distinguish human programmers, to the novel problem of distinguishing human- and LLM-generated code. In thrust (ii), the investigators conduct measurement studies of Open-Source software, generating empirical understanding of the presence and implications of LLM-generated code in the supply chain. Finally, thrust (iii) looks at the practical feasibility of code backdoors, and the effectiveness of automated reputation-based vetting as a defense.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代软件是通过将预先存在的软件包组合成软件产品而创建的。这种方法是由开放源代码范式的日益普及所促成的,在开放源代码范式中,软件包的源代码在允许重用的许可证下可用。这种方法加快了软件开发,带来了巨大的经济效益,但也带来了无意中将易受攻击的代码导入关键软件工具的风险。在开源开发中越来越多地使用人工智能(AI)工具进行代码生成,进一步加剧了风险。这些工具必须接受大量数据的训练,而这些数据并不总是经过严格的审查,因此它们可能会学会生成易受攻击的代码。更糟糕的是,恶意方可能会主动在其训练集中注入恶意代码。不幸的是,所有这些问题仍然知之甚少。该项目旨在衡量和减轻软件供应链中AI生成代码所带来的风险。它将调查人工智能工具的使用有多普遍,并描述它们所带来的安全风险。通过这样做,它将解决紧迫的经济和社会需求:人工智能有望为软件开发带来重大利益,但只有在降低其风险的情况下才能实现这些目标。研究成果将通过研讨会和黑客马拉松进行传播,其结果将成为课程和课程的一部分。这项工作将通过生产出处工具来改善软件供应链安全,从而使开源社区受益。 该项目是与加拿大研究人员的合作,他们具有互补的专门知识,为该项目提供了额外的资源。从技术上讲,正在研究的AI工具由各种用于代码生成的大型语言模型(LLM)组成。感兴趣的威胁模型是开发人员将易受攻击的LLM生成的代码插入到安全关键程序中的模型,无论是由于低质量的代码生成还是使用中毒/后门LLM。该项目包括三个方面,每个方面都解决了与威胁模型相关的研究问题:(i)如何以及在多大程度上区分LLM代码与人类编写的代码;(ii)LLM代码在供应链中已经存在的程度,以及其安全影响;以及(iii)在现实条件下,针对LLM代码生成的中毒攻击在多大程度上可以成功。在主旨(i)中,该项目扩展了现有的代码样式测量技术,直到现在还用于区分人类程序员,以区分人类和LLM生成的代码。在推力(ii)中,调查人员对开源软件进行测量研究,对LLM生成的代码在供应链中的存在和影响进行实证理解。最后,thrust(iii)着眼于代码后门的实际可行性,以及基于声誉的自动审查作为防御的有效性。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rachel Greenstadt其他文献
Challenges in Restructuring Community-based Moderation
重组基于社区的审核面临的挑战
- DOI:
10.48550/arxiv.2402.17880 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Chau Tran;Kejsi Take;Kaylea Champion;Benjamin Mako Hill;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions
从用户洞察到可操作的指标:以用户为中心的隐私保护浏览器扩展评估
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Ritik Roongta;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Stoking the Flames: Understanding Escalation in an Online Harassment Community
煽风点火:了解在线骚扰社区的升级
- DOI:
10.1145/3641015 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Kejsi Take;Victoria Zhong;Chris Geeng;Emmi Bevensee;Damon McCoy;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Feature Vector Difference based Authorship Verification for Open-World Settings
开放世界设置中基于特征向量差异的作者身份验证
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Janith Weerasinghe;Rhia Singh;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Rachel Greenstadt的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rachel Greenstadt', 18)}}的其他基金
Collaborative Research: Conference: 2023 Workshop for Aspiring PIs in Secure and Trusted Cyberspace
协作研究:会议:2023 年安全可信网络空间中有抱负的 PI 研讨会
- 批准号:
2247405 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Medium: Threat Intelligence for Targets of Coordinated Harassment
协作研究:SaTC:核心:中:协调骚扰目标的威胁情报
- 批准号:
2016061 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
- 批准号:
2031951 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
- 批准号:
1931005 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
- 批准号:
1813697 - 财政年份:2018
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
- 批准号:
1703736 - 财政年份:2017
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Student Travel Support: Privacy Enhancing Technology Symposium (PETS) 2015
学生旅行支持:隐私增强技术研讨会 (PETS) 2015
- 批准号:
1523108 - 财政年份:2015
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CAREER: Privacy Analytics for Users in a Big Data World
职业:大数据世界中用户的隐私分析
- 批准号:
1253418 - 财政年份:2013
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
EAGER: Investigating Diversity in Online Community Filtering
EAGER:调查在线社区过滤的多样性
- 批准号:
1048515 - 财政年份:2010
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似海外基金
NSF-NSERC: Fairness Fundamentals: Geometry-inspired Algorithms and Long-term Implications
NSF-NSERC:公平基础:几何启发的算法和长期影响
- 批准号:
2342253 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
NSF-NSERC: Building a two-qubit controlled phase gate using laterally coupled semiconductor quantum dots
NSF-NSERC:使用横向耦合半导体量子点构建两个量子位控制的相位门
- 批准号:
2317047 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
NSERC/BC SPCA Industrial Research Chair in Animal Welfare
NSERC/BC SPCA 动物福利工业研究主席
- 批准号:
554745-2019 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Industrial Research Chairs
NSERC/ Industrial Research Chair for Smart Supply Systems within the connected forest value chain
NSERC/互联森林价值链中智能供应系统工业研究主席
- 批准号:
545469-2018 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Industrial Research Chairs
NSERC Industrial Research Chair in Swine Welfare
NSERC 猪福利工业研究主席
- 批准号:
522207-2016 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Industrial Research Chairs
NSERC CREATE in Science Leadership for Global Sustainability
NSERC CREATE 促进全球可持续发展的科学领导力
- 批准号:
543314-2020 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Collaborative Research and Training Experience
NSERC Industrial Research Chair for Colleges in Northern Mine Remediation
NSERC 北方矿山修复学院工业研究主席
- 批准号:
533814-2018 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Industrial Research Chairs for Colleges Grants
L2M NSERC - A Novel Mechanical Sensor for Online Flow Rate Monitoring in Subsea Pipeline Networks
L2M NSERC - 用于海底管网在线流量监测的新型机械传感器
- 批准号:
580749-2023 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Idea to Innovation
NSERC Chair for Women in Science and Engineering (Québec)
NSERC 科学与工程领域女性主席(魁北克)
- 批准号:
548938-2019 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Chairs for Women in Science and Engineering - Project
L2M NSERC - Portfolio Management System for the Energy Industry
L2M NSERC - 能源行业投资组合管理系统
- 批准号:
580693-2023 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Idea to Innovation