权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness

合作研究：RI：小型：事后解释：暴露漏洞并确保稳健性

基本信息

批准号：
2008956
负责人：
Sameer Singh
金额：
$ 22.5万
依托单位：
University of California-Irvine
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2023-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2008956&HistoricalAwards=false
关键词：
Collaborative Research RI Small Post

项目摘要

The successful adoption of machine learning (ML) models in critical domains such as healthcare and criminal justice relies heavily on how well decision makers are able to understand and trust the functionality of these models. However, the proprietary nature and increasing complexity of ML models makes it challenging for domain experts to understand these complex "black boxes". Consequently, there has been a recent surge in techniques that explain black box models in a human interpretable manner by approximating them using simpler models. However, it is unclear to what extent these post hoc explanation techniques may mislead end users by giving them a false sense of security, and luring them into trusting and deploying untrustworthy black boxes. This project will build rigorous frameworks to expose the vulnerabilities of existing explanation techniques, assess how these vulnerabilities can manifest in real world applications, and develop new techniques to defend against these vulnerabilities. This project has the potential to significantly speed up the adoption of ML in a variety of domains including criminal justice (e.g., bail decisions), health care (e.g., patient diagnosis and treatment), and financial lending (e.g., loan approval).The goal of this project is to characterize the vulnerabilities of existing explanation techniques, understand how adversaries can exploit these vulnerabilities, and develop techniques to defend against them. The project will focus on the following subtasks: 1) understanding the real-world consequences of misleading explanations by conducting user studies and detailed interviews with domain experts in healthcare and criminal justice 2) identifying critical vulnerabilities in state-of-the-art explanation techniques that can be exploited by adversarial entities to generate misleading explanations, and 3) developing novel techniques for building robust and reliable explanations that are not prone to these vulnerabilities and thereby provide domain experts and other stakeholders with faithful explanations of complex black box models. With these contributions, the project will initiate a new body of research in ML interpretability that focuses on understanding how adversaries can manipulate explanation techniques, and how to defend against such attacks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习(ML)模型在医疗保健和刑事司法等关键领域的成功采用在很大程度上取决于决策者能够很好地理解和信任这些模型的功能。然而，ML模型的专有性质和日益增长的复杂性使得领域专家很难理解这些复杂的“黑盒”。因此，最近出现了一种技术的激增，这种技术通过使用更简单的模型来近似解释黑盒模型，从而以人类可以解释的方式解释它们。然而，目前尚不清楚这些事后解释技术会在多大程度上误导最终用户，给他们一种错误的安全感，并引诱他们信任和部署不值得信任的黑匣子。该项目将构建严格的框架，以暴露现有解释技术的漏洞，评估这些漏洞如何在现实世界的应用程序中表现出来，并开发新的技术来防御这些漏洞。该项目有可能显著加快ML在多个领域的采用速度，包括刑事司法(例如，保释决定)、医疗保健(例如，患者诊断和治疗)和金融贷款(例如，贷款批准)。该项目的目标是表征现有解释技术的漏洞，了解对手如何利用这些漏洞，并开发针对这些漏洞的防御技术。该项目将专注于以下子任务：1)通过进行用户研究和与医疗保健和刑事司法领域专家的详细访谈来了解误导性解释的现实世界后果；2)识别最先进的解释技术中的关键漏洞，这些漏洞可以被敌对实体利用来生成误导性解释；以及3)开发新的技术来构建健壮和可靠的解释，这些解释不容易受到这些漏洞的影响，从而为领域专家和其他利益相关者提供对复杂黑盒模型的忠实解释。有了这些贡献，该项目将启动一个新的ML可解释性研究机构，专注于了解对手如何操纵解释技术，以及如何防御此类攻击。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（5）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

DOI：
发表时间：
2020-08
期刊：
影响因子：
0
作者：
Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
通讯作者：
Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju

An Empirical Comparison of Instance Attribution Methods for NLP

DOI：
10.18653/v1/2021.naacl-main.75
发表时间：
2021-04
期刊：
ArXiv
影响因子：
0
作者：
Pouya Pezeshkpour;Sarthak Jain;Byron C. Wallace;Sameer Singh
通讯作者：
Pouya Pezeshkpour;Sarthak Jain;Byron C. Wallace;Sameer Singh

Counterfactual Explanations Can Be Manipulated

反事实解释可以被操纵

DOI：
发表时间：
2021
期刊：
Advances in Neural Information Processing Systems (NeurIPS
影响因子：
0
作者：
Slack, Dylan;Hilgard, Anna;Lakkaraju, Himabindu;Singh, Sameer
通讯作者：
Singh, Sameer

Combining Feature and Instance Attribution to Detect Artifacts

DOI：
10.18653/v1/2022.findings-acl.153
发表时间：
2021-07
期刊：
ArXiv
影响因子：
0
作者：
Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace
通讯作者：
Pouya Pezeshkpour;Sarthak Jain;Sameer Singh;Byron C. Wallace

Rethinking Explainability as a Dialogue: A Practitioner's Perspective

DOI：
发表时间：
2022-02
期刊：
ArXiv
影响因子：
0
作者：
Himabindu Lakkaraju;Dylan Slack;Yuxin Chen;Chenhao Tan;Sameer Singh
通讯作者：
Himabindu Lakkaraju;Dylan Slack;Yuxin Chen;Chenhao Tan;Sameer Singh

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sameer Singh其他文献

A survey of object recognition methods for automatic asset detection in high-definition video

高清视频中自动资产检测的对象识别方法综述

DOI：
10.1109/ukricis.2010.5898117
发表时间：
2010
期刊：
2010 IEEE 9th International Conference on Cyberntic Intelligent Systems
影响因子：
0
作者：
Thomas Warsop;Sameer Singh
通讯作者：
Sameer Singh

Multi-stage Classification for Audio Based Activity Recognition

基于音频的活动识别的多级分类

DOI：
10.1007/11875581_100
发表时间：
2006
期刊：
影响因子：
0
作者：
José Lopes;Charles Lin;Sameer Singh
通讯作者：
Sameer Singh

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

技能集优化：通过可转移技能强化语言模型行为

DOI：
10.48550/arxiv.2402.03244
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
通讯作者：
Roy Fox

Modeling Performance of Different Classification Methods : Deviation from the Power Law

不同分类方法的建模性能：偏离幂律

DOI：
发表时间：
2005
期刊：
影响因子：
0
作者：
Sameer Singh
通讯作者：
Sameer Singh

ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

ezCoref：一种收集众包注释以进行共指解析的可扩展方法

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
通讯作者：
Sameer Singh