权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Causal Modeling for Data Quality and Bias Mitigation

职业：数据质量和偏差缓解的因果建模

基本信息

批准号：
2340124
负责人：
Babak Salimi
金额：
$ 60万
依托单位：
University of California-San Diego
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-07-01 至 2029-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2340124&HistoricalAwards=false
关键词：
CAREER Causal Modeling Data Quality

项目摘要

This project presents a novel approach, inspired by database methodologies, to address the significant challenge of bias in algorithmic systems, particularly in sensitive domains such as credit scoring, medical diagnostics, predictive policing, and the criminal justice system. By recognizing that such biases often stem from the underlying data, the initiative redefines algorithmic bias as a data quality management issue. Emphasizing critical aspects of data quality management such as accuracy, completeness, and consistency, the project aims to develop methods that significantly enhance the trustworthiness and societal impact of these systems. Incorporating causal modeling with these essential data quality principles, it takes a strategic approach to identifying and addressing the root causes of algorithmic bias. This effort not only marks a significant advancement in the field of data science but also contributes substantially to national and public welfare by advocating for decision-making processes that are fair, accurate, and reliable, thereby promoting national health, prosperity, and well-being in a comprehensive manner. This plan envisions a wide-ranging dissemination of its motivation, approach, and artifacts through a diverse array of interdisciplinary colloquia, seminars, and co-curricular learning opportunities. This project addresses algorithmic bias through a fourfold approach: 1) Developing new, scalable algorithms for data repair, designed for repairing data concerning a special class of integrity constraints that can capture the statistical nuances of data used for training machine learning (ML) models. 2) Establishing a holistic data debiasing framework capable of addressing various data biases and quality issues. 3) Implementing methods to quantify uncertainty in algorithmic decision-making, particularly based on ML models, where the uncertainty stems from bias and data quality issues that cannot be fully recovered and removed due to incomplete information. 4) Lastly, the project focuses on developing methods for root-cause analysis to identify underlying issues and adaptive debiasing in dynamic data environments, incorporating proactive interventions in data processing pipelines for ongoing bias mitigation. This multifaceted strategy aims to advance the fields of data quality management, data cleaning for ML, and responsible data science, significantly enhancing the reliability, fairness, and accuracy of data-driven decision-making systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目提出了一种受数据库方法学启发的新方法，以解决算法系统中的偏见的重大挑战，特别是在信用评分，医疗诊断，预测性警务和刑事司法系统等敏感领域。通过认识到这种偏见往往源于基础数据，该倡议将算法偏见重新定义为数据质量管理问题。该项目强调数据质量管理的关键方面，如准确性、完整性和一致性，旨在开发可显著增强这些系统的可信度和社会影响的方法。通过使用这些基本的数据质量原则进行因果建模，它采取了一种战略方法来识别和解决算法偏差的根本原因。这一努力不仅标志着数据科学领域的重大进步，而且通过倡导公平，准确和可靠的决策过程，为国家和公共福利做出了重大贡献，从而全面促进国家健康，繁荣和福祉。该计划设想通过各种跨学科座谈会，研讨会和课外学习机会广泛传播其动机，方法和文物。该项目通过四种方法解决算法偏差：1）开发新的可扩展的数据修复算法，旨在修复有关特殊类别完整性约束的数据，这些约束可以捕获用于训练机器学习（ML）模型的数据的统计细微差别。2)建立一个全面的数据去偏见框架，能够解决各种数据偏见和质量问题。3)实施方法来量化算法决策中的不确定性，特别是基于ML模型，其中不确定性源于偏差和数据质量问题，由于信息不完整而无法完全恢复和删除。4)最后，该项目侧重于开发根本原因分析方法，以确定动态数据环境中的潜在问题和自适应去偏置，并在数据处理管道中纳入主动干预措施，以持续缓解偏置。这一多方面的战略旨在推进数据质量管理、ML数据清洗和负责任的数据科学领域，显著提高数据驱动决策系统的可靠性、公平性和准确性。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Babak Salimi其他文献

COMPARISON OF IMMORTALIZATION ASSAY AND POLYMERASE CHAIN REACTION DETECTION OF EPSTEIN-BARR VIRUS IN PEDIATRIC TRANSPLANT RECIPIENTS AND CONTROL SAMPLES

儿科移植受者和对照样品中 Epstein-Barr 病毒的永生化测定和聚合酶链反应检测的比较

DOI：
10.1080/pdp.21.4.433.443
发表时间：
2002
期刊：
Pediatric Pathology & Molecular Medicine
影响因子：
0
作者：
Babak Salimi;E. Alonso;R. Cohn;S. Mendley;B. Katz
通讯作者：
B. Katz

Causal What-If and How-To Analysis Using HYPER

使用 HYPER 进行因果假设和操作方法分析

DOI：
发表时间：
2023
期刊：
Demonstration Track
影响因子：
0
作者：
Fangzhu Shen;Kayvon Heravi;Oscar Gomez;Sainyam Galhotra;Amir Gilad;Sudeepa Roy;Babak Salimi
通讯作者：
Babak Salimi

First Workshop on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI)

第一届关于有效和负责任的人工智能的数据治理、理解和整合的研讨会（GUIDE-AI）

DOI：
10.1145/3626246.3655019
发表时间：
2024
期刊：
Companion of the 2024 International Conference on Management of Data
影响因子：
0
作者：
Abolfazl Asudeh;Sainyam Galhotra;Amir Gilad;Babak Salimi;Brit Youngmann
通讯作者：
Brit Youngmann

Inflammatory Potential of Diet and Odds of Lung Cancer: A Case-Control Study

饮食的炎症潜力与肺癌的发生几率：病例对照研究

DOI：
10.1080/01635581.2022.2036770
发表时间：
2022
期刊：
Nutrition and Cancer
影响因子：
0
作者：
A. Sadeghi;K. Parastouei;S. Seifi;A. Khosravi;Babak Salimi;H. Zahedi;O. Sadeghi;Hamid Rasekhi;M. Amini
通讯作者：
M. Amini