权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness

合作研究：RI：小型：事后解释：暴露漏洞并确保稳健性

基本信息

批准号：
2008461
负责人：
Himabindu Lakkaraju
金额：
$ 22.5万
依托单位：
Harvard University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2008461&HistoricalAwards=false
关键词：
Collaborative Research RI Small Post

项目摘要

The successful adoption of machine learning (ML) models in critical domains such as healthcare and criminal justice relies heavily on how well decision makers are able to understand and trust the functionality of these models. However, the proprietary nature and increasing complexity of ML models makes it challenging for domain experts to understand these complex "black boxes". Consequently, there has been a recent surge in techniques that explain black box models in a human interpretable manner by approximating them using simpler models. However, it is unclear to what extent these post hoc explanation techniques may mislead end users by giving them a false sense of security, and luring them into trusting and deploying untrustworthy black boxes. This project will build rigorous frameworks to expose the vulnerabilities of existing explanation techniques, assess how these vulnerabilities can manifest in real world applications, and develop new techniques to defend against these vulnerabilities. This project has the potential to significantly speed up the adoption of ML in a variety of domains including criminal justice (e.g., bail decisions), health care (e.g., patient diagnosis and treatment), and financial lending (e.g., loan approval).The goal of this project is to characterize the vulnerabilities of existing explanation techniques, understand how adversaries can exploit these vulnerabilities, and develop techniques to defend against them. The project will focus on the following subtasks: 1) understanding the real-world consequences of misleading explanations by conducting user studies and detailed interviews with domain experts in healthcare and criminal justice 2) identifying critical vulnerabilities in state-of-the-art explanation techniques that can be exploited by adversarial entities to generate misleading explanations, and 3) developing novel techniques for building robust and reliable explanations that are not prone to these vulnerabilities and thereby provide domain experts and other stakeholders with faithful explanations of complex black box models. With these contributions, the project will initiate a new body of research in ML interpretability that focuses on understanding how adversaries can manipulate explanation techniques, and how to defend against such attacks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习(ML)模型在医疗保健和刑事司法等关键领域的成功采用在很大程度上取决于决策者能够很好地理解和信任这些模型的功能。然而，ML模型的专有性质和日益增长的复杂性使得领域专家很难理解这些复杂的“黑盒”。因此，最近出现了一种技术的激增，这种技术通过使用更简单的模型来近似解释黑盒模型，从而以人类可以解释的方式解释它们。然而，目前尚不清楚这些事后解释技术会在多大程度上误导最终用户，给他们一种错误的安全感，并引诱他们信任和部署不值得信任的黑匣子。该项目将构建严格的框架，以暴露现有解释技术的漏洞，评估这些漏洞如何在现实世界的应用程序中表现出来，并开发新的技术来防御这些漏洞。该项目有可能显著加快ML在多个领域的采用速度，包括刑事司法(例如，保释决定)、医疗保健(例如，患者诊断和治疗)和金融贷款(例如，贷款批准)。该项目的目标是表征现有解释技术的漏洞，了解对手如何利用这些漏洞，并开发针对这些漏洞的防御技术。该项目将专注于以下子任务：1)通过进行用户研究和与医疗保健和刑事司法领域专家的详细访谈来了解误导性解释的现实世界后果；2)识别最先进的解释技术中的关键漏洞，这些漏洞可以被敌对实体利用来生成误导性解释；以及3)开发新的技术来构建健壮和可靠的解释，这些解释不容易受到这些漏洞的影响，从而为领域专家和其他利益相关者提供对复杂黑盒模型的忠实解释。有了这些贡献，该项目将启动一个新的ML可解释性研究机构，专注于了解对手如何操纵解释技术，以及如何防御此类攻击。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（18）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

DOI：
10.1145/3514094.3534159
发表时间：
2022-05
期刊：
Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
影响因子：
0
作者：
Jessica Dai;Sohini Upadhyay;U. Aïvodji;Stephen H. Bach;Himabindu Lakkaraju
通讯作者：
Jessica Dai;Sohini Upadhyay;U. Aïvodji;Stephen H. Bach;Himabindu Lakkaraju

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

DOI：
发表时间：
2021-06
期刊：
影响因子：
0
作者：
Martin Pawelczyk;Chirag Agarwal;Shalmali Joshi;Sohini Upadhyay;Himabindu Lakkaraju
通讯作者：
Martin Pawelczyk;Chirag Agarwal;Shalmali Joshi;Sohini Upadhyay;Himabindu Lakkaraju

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

DOI：
发表时间：
2020-08
期刊：
影响因子：
0
作者：
Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
通讯作者：
Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju

OpenXAI: Towards a Transparent Evaluation of Model Explanations

OpenXAI：迈向模型解释的透明评估

DOI：
发表时间：
2023
期刊：
Advances in neural information processing systems
影响因子：
0
作者：
Agarwal, Chirag;Krishna, Satyapriya;Saxena, Eshika;Pawelczyk, Martin;Johnson, Nari;Puri, Isha;Zitnik, Marinka;Lakkaraju, Himabindu
通讯作者：
Lakkaraju, Himabindu

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

弥合解释权和被遗忘权之间的差距