权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SaTC: CORE: Small: Systematic Threat Characterization and Prevention in Open-Domain Dialog Systems

SaTC：核心：小型：开放域对话系统中的系统威胁特征描述和预防

基本信息

批准号：
2231002
负责人：
Bimal Viswanath
金额：
$ 60万
依托单位：
Virginia Polytechnic Institute and State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-02-01 至 2026-01-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2231002&HistoricalAwards=false
关键词：
SaTC CORE Small Systematic Threat

项目摘要

Dialog systems or chatbots powered by deep neural networks are increasingly being deployed at scale without understanding the vulnerabilities impacting them. Using specially designed learning algorithms, these chatbots are trained to learn from existing human-human conversation data to produce convincing conversations on a variety of topics. However, biases in the training data, including intentionally injected ones, can make these systems ripe for abuse by malicious actors who aim to trigger toxic or harmful conversations. This may expose vulnerable users to potential harms, given the lack of attention to security in existing deployments and the fact that they are used in sensitive domains such as healthcare, emotional support, and the U.S. justice system. This project will systematically characterize a variety of threats impacting chatbot systems, then build novel deployable defenses to measure toxicity, uncover hidden vulnerabilities, detoxify impacted systems, and enable attack-resilient training pipelines. The project will also create partnerships between multiple computer science disciplines and between industry and academia to raise awareness of and defend against these threats. The project provides unique opportunities to underrepresented K-12 students to study emerging topics in the field of machine learning and security, aiming to attract them towards STEM careers. This project has three research thrusts. The first is conducting a large-scale measurement study using widely used chatbot pipelines to characterize their vulnerability to unintentionally and intentionally injected toxicity. Toxicity injection attacks are characterized using a novel, fully automated pipeline that leverages large language models with minimal human supervision, allowing the methods to scale. The second thrust is developing a novel generative modeling approach to probe chatbots for hidden toxicity vulnerabilities, and to detoxify models and create safety benchmarks. The third thrust builds on the earlier findings to develop a novel attack-agnostic training pipeline that is resilient to toxicity injection attacks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

由深度神经网络驱动的对话系统或聊天机器人越来越多地被大规模部署，而不了解影响它们的漏洞。使用专门设计的学习算法，这些聊天机器人经过训练，可以从现有的人与人之间的对话数据中学习，从而在各种主题上产生令人信服的对话。然而，训练数据中的偏差，包括故意注入的偏差，可能会使这些系统被恶意行为者滥用，从而引发有毒或有害的对话。这可能会使易受攻击的用户面临潜在的伤害，因为现有部署中缺乏对安全性的关注，并且它们被用于医疗保健，情感支持和美国司法系统等敏感领域。该项目将系统地描述影响聊天机器人系统的各种威胁，然后构建新型的可部署防御系统来测量毒性、发现隐藏的漏洞、消除受影响的系统的毒性，并实现攻击恢复训练管道。该项目还将在多个计算机科学学科之间以及工业界和学术界之间建立伙伴关系，以提高对这些威胁的认识和防御。该项目为代表性不足的K-12学生提供了独特的机会，学习机器学习和安全领域的新兴主题，旨在吸引他们从事STEM职业。该项目有三个研究重点。第一个是使用广泛使用的聊天机器人管道进行大规模测量研究，以表征它们对无意和有意注入的毒性的脆弱性。毒性注入攻击的特点是使用一种新颖的、全自动的管道，该管道利用大型语言模型，最少的人工监督，允许方法扩展。第二个重点是开发一种新的生成建模方法，以探测聊天机器人隐藏的毒性漏洞，并对模型进行解毒并创建安全基准。第三个目标是在早期发现的基础上开发一种新的攻击不可知的培训管道，该管道能够抵御毒性注入攻击。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A First Look at Toxicity Injection Attacks on Open-domain Chatbots

DOI：
10.1145/3627106.3627122
发表时间：
2023-12
期刊：
Proceedings of the 39th Annual Computer Security Applications Conference
影响因子：
0
作者：
Connor Weeks;Aravind Cheruvu;Sifat Muhammad Abdullah;Shravya Kanchi;Daphne Yao;Bimal Viswanath
通讯作者：
Connor Weeks;Aravind Cheruvu;Sifat Muhammad Abdullah;Shravya Kanchi;Daphne Yao;Bimal Viswanath

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Bimal Viswanath其他文献

Keeping information safe from social networking apps

确保社交网络应用程序中的信息安全

DOI：
10.1145/2342549.2342561
发表时间：
2012
期刊：
Proceedings of the 22nd International Conference on World Wide Web
影响因子：
0
作者：
Bimal Viswanath;Emre Kıcıman;S. Saroiu
通讯作者：
S. Saroiu

Towards trustworthy social computing systems

DOI：
10.22028/d291-25429
发表时间：
2016
期刊：
影响因子：
0
作者：
Bimal Viswanath
通讯作者：
Bimal Viswanath

What Happens After You Leak Your Password: Understanding Credential Sharing on Phishing Sites

泄露密码后会发生什么：了解网络钓鱼网站上的凭据共享

DOI：
发表时间：
2019
期刊：
ACM Asia Conference on Computer and Communications Security
影响因子：
0
作者：
Peng Peng;Chao Xu;Luke Quinn;Hang Hu;Bimal Viswanath;Gang Wang
通讯作者：
Gang Wang

Exploring the design space of social network-based Sybil defenses

探索基于社交网络的 Sybil 防御的设计空间

DOI：
发表时间：
2012
期刊：
International Conference on Communication Systems and Networks
影响因子：
0
作者：
Bimal Viswanath;Mainack Mondal;Allen Clement;P. Druschel;K. Gummadi;A. Mislove;Ansley Post
通讯作者：
Ansley Post

Strength in Numbers: Robust Tamper Detection in Crowd Computations

数量优势：人群计算中稳健的篡改检测

DOI：
10.1145/2817946.2817964
发表时间：
2015
期刊：
Proceedings of the 2015 ACM on Conference on Online Social Networks
影响因子：
0
作者：
Bimal Viswanath;M. Bashir;M. B. Zafar;S. Bouget;S. Guha;K. Gummadi;Aniket Kate;A. Mislove
通讯作者：
A. Mislove