权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models

职业：检测、理解和修复自然语言处理模型中的漏洞

基本信息

批准号：
2046873
负责人：
Sameer Singh
金额：
$ 50万
依托单位：
University of California-Irvine
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-07-01 至 2026-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2046873&HistoricalAwards=false
关键词：
CAREER Detecting Understanding Fixing Vulnerabilities

项目摘要

With recent advances in machine learning, models have achieved high accuracy on many challenging tasks in natural language processing (NLP) such as question answering, machine translation, and dialog agents, sometimes coming close to or beating human performance on these benchmarks. However, these NLP models often suffer from brittleness in many different ways: they latch onto erroneous artifacts, do not support natural variations in language, are not robust to adversarial attacks, and only work on a few domains. Existing pipelines for developing NLP models lack support for useful insights, and identifying bugs requires considerable effort from experts both in machine learning and the domain. This CAREER project develops several techniques to support this need for more robust training and evaluation pipelines for NLP, providing easy-to-use, scalable, and accurate mechanisms for identifying, understanding, and addressing NLP models' vulnerabilities. The developed methods will support diverse application areas such as conversational agents, sentiment classifiers, and abuse/hate speech detection. Further, the team engages with the developers of NLP models in academia and industry to develop a data science curriculum for K-12 education, particularly for students from underrepresented communities.Based on the notion of vulnerability as unexpected behavior on certain input transformations, the team will contribute across the following three thrusts. The first thrust identifies vulnerabilities by testing user-defined behaviors and searching over many possible vulnerabilities. In the second thrust, the investigators develop methods to understand the model's vulnerabilities by tracing the causes of errors to individual training data points and data artifacts. The last thrust will develop approaches to address vulnerabilities in models by directly injecting the vulnerability definitions into the model during training and using explanation-based annotations to supervise the models. These thrusts build upon the goals of behavioral testing, explanation-based interactions, and architecture agnosticism to support most current and future NLP models and applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着机器学习的最新进展，模型在自然语言处理（NLP）中的许多具有挑战性的任务（如问答，机器翻译和对话代理）中实现了高准确性，有时接近或击败人类在这些基准上的表现。然而，这些NLP模型通常在许多不同的方面存在脆弱性：它们锁定错误的工件，不支持语言的自然变化，对对抗性攻击不鲁棒，并且只适用于少数几个领域。现有的开发NLP模型的管道缺乏对有用见解的支持，识别错误需要机器学习和该领域专家的大量努力。这个CAREER项目开发了几种技术来支持对NLP更强大的培训和评估管道的需求，为识别，理解和解决NLP模型的漏洞提供易于使用，可扩展和准确的机制。所开发的方法将支持不同的应用领域，如会话代理，情感分类器和滥用/仇恨言论检测。此外，该团队还与学术界和工业界的NLP模型开发人员合作，为K-12教育开发数据科学课程，特别是针对来自代表性不足社区的学生。基于脆弱性是某些输入转换的意外行为的概念，该团队将在以下三个方面做出贡献。第一个推力通过测试用户定义的行为和搜索许多可能的漏洞来识别漏洞。在第二个方面，研究人员开发了通过跟踪错误原因到单个训练数据点和数据工件来理解模型漏洞的方法。最后一个重点是开发解决模型中漏洞的方法，方法是在训练过程中将漏洞定义直接注入模型，并使用基于简化的注释来监督模型。这些推动力建立在行为测试、基于解释的交互和架构不可知论的目标之上，以支持大多数当前和未来的NLP模型和应用。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Quantifying Social Biases Using Templates is Unreliable

使用模板量化社会偏见是不可靠的

DOI：
发表时间：
2022
期刊：
NeurIPS Workshop on Trustworthy and Socially Responsible Machine Learning (TSRML
影响因子：
0
作者：
Seshadri, Preethi;Pezeshkpour, Pouya;Singh, Sameer
通讯作者：
Singh, Sameer

Explaining machine learning models with interactive natural language conversations using TalkToModel

DOI：
10.1038/s42256-023-00692-8
发表时间：
2022-07
期刊：
Nature Machine Intelligence
影响因子：
23.8
作者：
Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
通讯作者：
Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh

MISGENDERED: Limits of Large Language Models in Understanding Pronouns

性别错误：大型语言模型在理解代词方面的局限性

DOI：
10.18653/v1/2023.acl-long.293
发表时间：
2023
期刊：
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers
影响因子：
0
作者：
Hossain, Tamanna;Dev, Sunipa;Singh, Sameer
通讯作者：
Singh, Sameer

Combining Feature and Instance Attribution to Detect Artifacts

DOI：
10.18653/v1/2022.findings-acl.153
发表时间：
2021-07
期刊：
ArXiv
影响因子：
0
作者：
Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
通讯作者：
Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace

TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Sameer Singh
通讯作者：
Sameer Singh

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sameer Singh其他文献

A survey of object recognition methods for automatic asset detection in high-definition video

高清视频中自动资产检测的对象识别方法综述

DOI：
10.1109/ukricis.2010.5898117
发表时间：
2010
期刊：
2010 IEEE 9th International Conference on Cyberntic Intelligent Systems
影响因子：
0
作者：
Thomas Warsop;Sameer Singh
通讯作者：
Sameer Singh

Multi-stage Classification for Audio Based Activity Recognition

基于音频的活动识别的多级分类

DOI：
10.1007/11875581_100
发表时间：
2006
期刊：
影响因子：
0
作者：
José Lopes;Charles Lin;Sameer Singh
通讯作者：
Sameer Singh

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

技能集优化：通过可转移技能强化语言模型行为

DOI：
10.48550/arxiv.2402.03244
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
通讯作者：
Roy Fox

Modeling Performance of Different Classification Methods : Deviation from the Power Law

不同分类方法的建模性能：偏离幂律

DOI：
发表时间：
2005
期刊：
影响因子：
0
作者：
Sameer Singh
通讯作者：
Sameer Singh

ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

ezCoref：一种收集众包注释以进行共指解析的可扩展方法

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
通讯作者：
Sameer Singh