Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness

合作研究:RI:小型:事后解释:暴露漏洞并确保稳健性

基本信息

  • 批准号:
    2008956
  • 负责人:
  • 金额:
    $ 22.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

The successful adoption of machine learning (ML) models in critical domains such as healthcare and criminal justice relies heavily on how well decision makers are able to understand and trust the functionality of these models. However, the proprietary nature and increasing complexity of ML models makes it challenging for domain experts to understand these complex "black boxes". Consequently, there has been a recent surge in techniques that explain black box models in a human interpretable manner by approximating them using simpler models. However, it is unclear to what extent these post hoc explanation techniques may mislead end users by giving them a false sense of security, and luring them into trusting and deploying untrustworthy black boxes. This project will build rigorous frameworks to expose the vulnerabilities of existing explanation techniques, assess how these vulnerabilities can manifest in real world applications, and develop new techniques to defend against these vulnerabilities. This project has the potential to significantly speed up the adoption of ML in a variety of domains including criminal justice (e.g., bail decisions), health care (e.g., patient diagnosis and treatment), and financial lending (e.g., loan approval).The goal of this project is to characterize the vulnerabilities of existing explanation techniques, understand how adversaries can exploit these vulnerabilities, and develop techniques to defend against them. The project will focus on the following subtasks: 1) understanding the real-world consequences of misleading explanations by conducting user studies and detailed interviews with domain experts in healthcare and criminal justice 2) identifying critical vulnerabilities in state-of-the-art explanation techniques that can be exploited by adversarial entities to generate misleading explanations, and 3) developing novel techniques for building robust and reliable explanations that are not prone to these vulnerabilities and thereby provide domain experts and other stakeholders with faithful explanations of complex black box models. With these contributions, the project will initiate a new body of research in ML interpretability that focuses on understanding how adversaries can manipulate explanation techniques, and how to defend against such attacks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习(ML)模型在医疗保健和刑事司法等关键领域的成功采用在很大程度上取决于决策者能够很好地理解和信任这些模型的功能。然而,ML模型的专有性质和日益增长的复杂性使得领域专家很难理解这些复杂的“黑盒”。因此,最近出现了一种技术的激增,这种技术通过使用更简单的模型来近似解释黑盒模型,从而以人类可以解释的方式解释它们。然而,目前尚不清楚这些事后解释技术会在多大程度上误导最终用户,给他们一种错误的安全感,并引诱他们信任和部署不值得信任的黑匣子。该项目将构建严格的框架,以暴露现有解释技术的漏洞,评估这些漏洞如何在现实世界的应用程序中表现出来,并开发新的技术来防御这些漏洞。该项目有可能显著加快ML在多个领域的采用速度,包括刑事司法(例如,保释决定)、医疗保健(例如,患者诊断和治疗)和金融贷款(例如,贷款批准)。该项目的目标是表征现有解释技术的漏洞,了解对手如何利用这些漏洞,并开发针对这些漏洞的防御技术。该项目将专注于以下子任务:1)通过进行用户研究和与医疗保健和刑事司法领域专家的详细访谈来了解误导性解释的现实世界后果;2)识别最先进的解释技术中的关键漏洞,这些漏洞可以被敌对实体利用来生成误导性解释;以及3)开发新的技术来构建健壮和可靠的解释,这些解释不容易受到这些漏洞的影响,从而为领域专家和其他利益相关者提供对复杂黑盒模型的忠实解释。有了这些贡献,该项目将启动一个新的ML可解释性研究机构,专注于了解对手如何操纵解释技术,以及如何防御此类攻击。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
  • DOI:
  • 发表时间:
    2020-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
  • 通讯作者:
    Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
An Empirical Comparison of Instance Attribution Methods for NLP
  • DOI:
    10.18653/v1/2021.naacl-main.75
  • 发表时间:
    2021-04
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pouya Pezeshkpour;Sarthak Jain;Byron C. Wallace;Sameer Singh
  • 通讯作者:
    Pouya Pezeshkpour;Sarthak Jain;Byron C. Wallace;Sameer Singh
Counterfactual Explanations Can Be Manipulated
反事实解释可以被操纵
Combining Feature and Instance Attribution to Detect Artifacts
  • DOI:
    10.18653/v1/2022.findings-acl.153
  • 发表时间:
    2021-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
  • 通讯作者:
    Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
Rethinking Explainability as a Dialogue: A Practitioner's Perspective
  • DOI:
  • 发表时间:
    2022-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Himabindu Lakkaraju;Dylan Slack;Yuxin Chen;Chenhao Tan;Sameer Singh
  • 通讯作者:
    Himabindu Lakkaraju;Dylan Slack;Yuxin Chen;Chenhao Tan;Sameer Singh
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sameer Singh其他文献

A survey of object recognition methods for automatic asset detection in high-definition video
高清视频中自动资产检测的对象识别方法综述
Multi-stage Classification for Audio Based Activity Recognition
基于音频的活动识别的多级分类
  • DOI:
    10.1007/11875581_100
  • 发表时间:
    2006
  • 期刊:
  • 影响因子:
    0
  • 作者:
    José Lopes;Charles Lin;Sameer Singh
  • 通讯作者:
    Sameer Singh
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
技能集优化:通过可转移技能强化语言模型行为
  • DOI:
    10.48550/arxiv.2402.03244
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
  • 通讯作者:
    Roy Fox
Modeling Performance of Different Classification Methods : Deviation from the Power Law
不同分类方法的建模性能:偏离幂律
  • DOI:
  • 发表时间:
    2005
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Singh
  • 通讯作者:
    Sameer Singh
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
  • 通讯作者:
    Sameer Singh

Sameer Singh的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sameer Singh', 18)}}的其他基金

CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models
职业:检测、理解和修复自然语言处理模型中的漏洞
  • 批准号:
    2046873
  • 财政年份:
    2021
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Continuing Grant
CCRI: ENS: Machine Learning Democratization via a Linked, Annotated Repository of Datasets
CCRI:ENS:通过链接、带注释的数据集存储库实现机器学习民主化
  • 批准号:
    1925741
  • 财政年份:
    2019
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
CRII: RI: Explaining Decisions of Black-box Models via Input Perturbations
CRII:RI:通过输入扰动解释黑盒模型的决策
  • 批准号:
    1756023
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
RI: Small: Modeling Multiple Modalities for Knowledge-Base Construction
RI:小型:知识库构建的多种模式建模
  • 批准号:
    1817183
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312841
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312842
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
  • 批准号:
    2313131
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313151
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Continuing Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232298
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312840
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
  • 批准号:
    2345528
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312374
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312373
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2232055
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了