CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models

职业:检测、理解和修复自然语言处理模型中的漏洞

基本信息

  • 批准号:
    2046873
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2026-06-30
  • 项目状态:
    未结题

项目摘要

With recent advances in machine learning, models have achieved high accuracy on many challenging tasks in natural language processing (NLP) such as question answering, machine translation, and dialog agents, sometimes coming close to or beating human performance on these benchmarks. However, these NLP models often suffer from brittleness in many different ways: they latch onto erroneous artifacts, do not support natural variations in language, are not robust to adversarial attacks, and only work on a few domains. Existing pipelines for developing NLP models lack support for useful insights, and identifying bugs requires considerable effort from experts both in machine learning and the domain. This CAREER project develops several techniques to support this need for more robust training and evaluation pipelines for NLP, providing easy-to-use, scalable, and accurate mechanisms for identifying, understanding, and addressing NLP models' vulnerabilities. The developed methods will support diverse application areas such as conversational agents, sentiment classifiers, and abuse/hate speech detection. Further, the team engages with the developers of NLP models in academia and industry to develop a data science curriculum for K-12 education, particularly for students from underrepresented communities.Based on the notion of vulnerability as unexpected behavior on certain input transformations, the team will contribute across the following three thrusts. The first thrust identifies vulnerabilities by testing user-defined behaviors and searching over many possible vulnerabilities. In the second thrust, the investigators develop methods to understand the model's vulnerabilities by tracing the causes of errors to individual training data points and data artifacts. The last thrust will develop approaches to address vulnerabilities in models by directly injecting the vulnerability definitions into the model during training and using explanation-based annotations to supervise the models. These thrusts build upon the goals of behavioral testing, explanation-based interactions, and architecture agnosticism to support most current and future NLP models and applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着机器学习的最新进展,模型在自然语言处理(NLP)中的许多具有挑战性的任务(如问答,机器翻译和对话代理)中实现了高准确性,有时接近或击败人类在这些基准上的表现。然而,这些NLP模型通常在许多不同的方面存在脆弱性:它们锁定错误的工件,不支持语言的自然变化,对对抗性攻击不鲁棒,并且只适用于少数几个领域。现有的开发NLP模型的管道缺乏对有用见解的支持,识别错误需要机器学习和该领域专家的大量努力。这个CAREER项目开发了几种技术来支持对NLP更强大的培训和评估管道的需求,为识别,理解和解决NLP模型的漏洞提供易于使用,可扩展和准确的机制。所开发的方法将支持不同的应用领域,如会话代理,情感分类器和滥用/仇恨言论检测。此外,该团队还与学术界和工业界的NLP模型开发人员合作,为K-12教育开发数据科学课程,特别是针对来自代表性不足社区的学生。基于脆弱性是某些输入转换的意外行为的概念,该团队将在以下三个方面做出贡献。第一个推力通过测试用户定义的行为和搜索许多可能的漏洞来识别漏洞。在第二个方面,研究人员开发了通过跟踪错误原因到单个训练数据点和数据工件来理解模型漏洞的方法。最后一个重点是开发解决模型中漏洞的方法,方法是在训练过程中将漏洞定义直接注入模型,并使用基于简化的注释来监督模型。这些推动力建立在行为测试、基于解释的交互和架构不可知论的目标之上,以支持大多数当前和未来的NLP模型和应用。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Quantifying Social Biases Using Templates is Unreliable
使用模板量化社会偏见是不可靠的
Explaining machine learning models with interactive natural language conversations using TalkToModel
  • DOI:
    10.1038/s42256-023-00692-8
  • 发表时间:
    2022-07
  • 期刊:
  • 影响因子:
    23.8
  • 作者:
    Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
  • 通讯作者:
    Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
MISGENDERED: Limits of Large Language Models in Understanding Pronouns
性别错误:大型语言模型在理解代词方面的局限性
Combining Feature and Instance Attribution to Detect Artifacts
  • DOI:
    10.18653/v1/2022.findings-acl.153
  • 发表时间:
    2021-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
  • 通讯作者:
    Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Singh
  • 通讯作者:
    Sameer Singh
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sameer Singh其他文献

A survey of object recognition methods for automatic asset detection in high-definition video
高清视频中自动资产检测的对象识别方法综述
Multi-stage Classification for Audio Based Activity Recognition
基于音频的活动识别的多级分类
  • DOI:
    10.1007/11875581_100
  • 发表时间:
    2006
  • 期刊:
  • 影响因子:
    0
  • 作者:
    José Lopes;Charles Lin;Sameer Singh
  • 通讯作者:
    Sameer Singh
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
技能集优化:通过可转移技能强化语言模型行为
  • DOI:
    10.48550/arxiv.2402.03244
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
  • 通讯作者:
    Roy Fox
Modeling Performance of Different Classification Methods : Deviation from the Power Law
不同分类方法的建模性能:偏离幂律
  • DOI:
  • 发表时间:
    2005
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Singh
  • 通讯作者:
    Sameer Singh
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
  • 通讯作者:
    Sameer Singh

Sameer Singh的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sameer Singh', 18)}}的其他基金

Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness
合作研究:RI:小型:事后解释:暴露漏洞并确保稳健性
  • 批准号:
    2008956
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CCRI: ENS: Machine Learning Democratization via a Linked, Annotated Repository of Datasets
CCRI:ENS:通过链接、带注释的数据集存储库实现机器学习民主化
  • 批准号:
    1925741
  • 财政年份:
    2019
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CRII: RI: Explaining Decisions of Black-box Models via Input Perturbations
CRII:RI:通过输入扰动解释黑盒模型的决策
  • 批准号:
    1756023
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
RI: Small: Modeling Multiple Modalities for Knowledge-Base Construction
RI:小型:知识库构建的多种模式建模
  • 批准号:
    1817183
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant

相似海外基金

Detecting and Understanding Disparities in Pediatric Safety Events for Hospitalized Children
检测和了解住院儿童儿科安全事件的差异
  • 批准号:
    10661525
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
Detecting and Understanding Disparities in Pediatric Safety Events for Hospitalized Children
检测和了解住院儿童儿科安全事件的差异
  • 批准号:
    10450528
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
SaTC: CORE: Small: Collaborative: Understanding and Detecting Memory Bugs in Rust
SaTC:核心:小:协作:理解和检测 Rust 中的内存错误
  • 批准号:
    1955965
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: Collaborative: Understanding and Detecting Memory Bugs in Rust
SaTC:核心:小:协作:理解和检测 Rust 中的内存错误
  • 批准号:
    1956364
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CRII: CSR: Toward Understanding and Automatically Detecting Specious Configuration in Large Systems
CRII:CSR:理解和自动检测大型系统中的可疑配置
  • 批准号:
    1755737
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
Detecting short-period gravity changes associated with magma mass redistribution: toward further understanding of volcanic eruption processes
检测与岩浆质量重新分布相关的短期重力变化:进一步了解火山喷发过程
  • 批准号:
    15K17749
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Detecting and understanding mycobacterial infections in domestic cats and their role in the risk of zoontoci, intra- and inter-species spread
检测和了解家猫的分枝杆菌感染及其在人畜共患病、物种内和物种间传播风险中的作用
  • 批准号:
    1658195
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Studentship
Detecting and understanding mycobacterial infections in domestic cats and their role in the risk of zoonotic, intra- and inter-species spread.
检测和了解家猫的分枝杆菌感染及其在人畜共患、物种内和物种间传播风险中的作用。
  • 批准号:
    BB/M014894/1
  • 财政年份:
    2015
  • 资助金额:
    $ 50万
  • 项目类别:
    Training Grant
Understanding, detecting, monitoring and treating brain dysfunctions due to chronic immune diseases
了解、检测、监测和治疗慢性免疫疾病引起的脑功能障碍
  • 批准号:
    nhmrc : 1045400
  • 财政年份:
    2013
  • 资助金额:
    $ 50万
  • 项目类别:
    Career Development Fellowships
Translatable biotechnology for detecting oxidative amino acid modifications and understanding of the role of radicals in inflammation
用于检测氧化氨基酸修饰和了解自由基在炎症中的作用的可转化生物技术
  • 批准号:
    BB/J012939/1
  • 财政年份:
    2012
  • 资助金额:
    $ 50万
  • 项目类别:
    Training Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了