Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness
合作研究:RI:小型:事后解释:暴露漏洞并确保稳健性
基本信息
- 批准号:2008461
- 负责人:
- 金额:$ 22.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The successful adoption of machine learning (ML) models in critical domains such as healthcare and criminal justice relies heavily on how well decision makers are able to understand and trust the functionality of these models. However, the proprietary nature and increasing complexity of ML models makes it challenging for domain experts to understand these complex "black boxes". Consequently, there has been a recent surge in techniques that explain black box models in a human interpretable manner by approximating them using simpler models. However, it is unclear to what extent these post hoc explanation techniques may mislead end users by giving them a false sense of security, and luring them into trusting and deploying untrustworthy black boxes. This project will build rigorous frameworks to expose the vulnerabilities of existing explanation techniques, assess how these vulnerabilities can manifest in real world applications, and develop new techniques to defend against these vulnerabilities. This project has the potential to significantly speed up the adoption of ML in a variety of domains including criminal justice (e.g., bail decisions), health care (e.g., patient diagnosis and treatment), and financial lending (e.g., loan approval).The goal of this project is to characterize the vulnerabilities of existing explanation techniques, understand how adversaries can exploit these vulnerabilities, and develop techniques to defend against them. The project will focus on the following subtasks: 1) understanding the real-world consequences of misleading explanations by conducting user studies and detailed interviews with domain experts in healthcare and criminal justice 2) identifying critical vulnerabilities in state-of-the-art explanation techniques that can be exploited by adversarial entities to generate misleading explanations, and 3) developing novel techniques for building robust and reliable explanations that are not prone to these vulnerabilities and thereby provide domain experts and other stakeholders with faithful explanations of complex black box models. With these contributions, the project will initiate a new body of research in ML interpretability that focuses on understanding how adversaries can manipulate explanation techniques, and how to defend against such attacks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
机器学习(ML)模型在医疗保健和刑事司法等关键领域的成功采用在很大程度上取决于决策者能够很好地理解和信任这些模型的功能。然而,ML模型的专有性质和日益增长的复杂性使得领域专家很难理解这些复杂的“黑盒”。因此,最近出现了一种技术的激增,这种技术通过使用更简单的模型来近似解释黑盒模型,从而以人类可以解释的方式解释它们。然而,目前尚不清楚这些事后解释技术会在多大程度上误导最终用户,给他们一种错误的安全感,并引诱他们信任和部署不值得信任的黑匣子。该项目将构建严格的框架,以暴露现有解释技术的漏洞,评估这些漏洞如何在现实世界的应用程序中表现出来,并开发新的技术来防御这些漏洞。该项目有可能显著加快ML在多个领域的采用速度,包括刑事司法(例如,保释决定)、医疗保健(例如,患者诊断和治疗)和金融贷款(例如,贷款批准)。该项目的目标是表征现有解释技术的漏洞,了解对手如何利用这些漏洞,并开发针对这些漏洞的防御技术。该项目将专注于以下子任务:1)通过进行用户研究和与医疗保健和刑事司法领域专家的详细访谈来了解误导性解释的现实世界后果;2)识别最先进的解释技术中的关键漏洞,这些漏洞可以被敌对实体利用来生成误导性解释;以及3)开发新的技术来构建健壮和可靠的解释,这些解释不容易受到这些漏洞的影响,从而为领域专家和其他利益相关者提供对复杂黑盒模型的忠实解释。有了这些贡献,该项目将启动一个新的ML可解释性研究机构,专注于了解对手如何操纵解释技术,以及如何防御此类攻击。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(18)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations
- DOI:10.1145/3514094.3534159
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:Jessica Dai;Sohini Upadhyay;U. Aïvodji;Stephen H. Bach;Himabindu Lakkaraju
- 通讯作者:Jessica Dai;Sohini Upadhyay;U. Aïvodji;Stephen H. Bach;Himabindu Lakkaraju
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis
- DOI:
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Martin Pawelczyk;Chirag Agarwal;Shalmali Joshi;Sohini Upadhyay;Himabindu Lakkaraju
- 通讯作者:Martin Pawelczyk;Chirag Agarwal;Shalmali Joshi;Sohini Upadhyay;Himabindu Lakkaraju
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
- DOI:
- 发表时间:2020-08
- 期刊:
- 影响因子:0
- 作者:Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
- 通讯作者:Dylan Slack;Sophie Hilgard;Sameer Singh;Himabindu Lakkaraju
OpenXAI: Towards a Transparent Evaluation of Model Explanations
OpenXAI:迈向模型解释的透明评估
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Agarwal, Chirag;Krishna, Satyapriya;Saxena, Eshika;Pawelczyk, Martin;Johnson, Nari;Puri, Isha;Zitnik, Marinka;Lakkaraju, Himabindu
- 通讯作者:Lakkaraju, Himabindu
Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
弥合解释权和被遗忘权之间的差距
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Krishna, Satyapriya;Ma, Jiaqi;Lakkaraju, Himabindu.
- 通讯作者:Lakkaraju, Himabindu.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Himabindu Lakkaraju其他文献
Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring
公平排名会改善少数群体的结果吗?
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Tom Sühr;Sophie Hilgard;Himabindu Lakkaraju - 通讯作者:
Himabindu Lakkaraju
Can I Still Trust You?: Understanding the Impact of Distribution Shifts on Algorithmic Recourses
我还能相信你吗?:了解分布变化对算法资源的影响
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Kaivalya Rawal;Ece Kamar;Himabindu Lakkaraju - 通讯作者:
Himabindu Lakkaraju
A Non Parametric Theme Event Topic Model for Characterizing Microblogs
表征微博的非参数主题事件主题模型
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Himabindu Lakkaraju;Hyung - 通讯作者:
Hyung
L ET U SERS D ECIDE : N AVIGATING THE T RADE - OFFS BETWEEN C OSTS AND R OBUSTNESS IN A LGORITHMIC R ECOURSE
让用户决定:在算法资源的成本和稳健性之间进行权衡
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Martin Pawelczyk;Teresa Datta;Johannes van;Gjergji Kasneci;Himabindu Lakkaraju - 通讯作者:
Himabindu Lakkaraju
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
忠实性与合理性:论大型语言模型解释的(不)可靠性
- DOI:
10.48550/arxiv.2402.04614 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Chirag Agarwal;Sree Harsha Tanneru;Himabindu Lakkaraju - 通讯作者:
Himabindu Lakkaraju
Himabindu Lakkaraju的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Himabindu Lakkaraju', 18)}}的其他基金
Career: Towards a Systematic Characterization of Model Explanations for High-Stakes Decision Making
职业生涯:高风险决策模型解释的系统表征
- 批准号:
2238714 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312841 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312842 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313151 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Continuing Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312840 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312374 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312373 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant