CAREER: From Dirty Data to Fair Prediction: Data Preparation Framework for End-to-End Equitable Machine Learning

职业:从脏数据到公平预测:端到端公平机器学习的数据准备框架

基本信息

  • 批准号:
    2341055
  • 负责人:
  • 金额:
    $ 55.82万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-07-01 至 2029-06-30
  • 项目状态:
    未结题

项目摘要

In an era where AI is becoming integrated into every facet of life, the need for AI systems that respect ethical expectations has never been more crucial. Modern AI algorithms learn from examples, and the creation of more ethical systems should start by supplying better examples to the algorithm. While collecting more quality data is usually very expensive, improving the quality of data by making better choices in the data-preparation stage adds minimal extra cost. This research departs from the current focus of considering ethical goals in the training phase, which is merely a small part of the end-to-end data science lifecycle, and targets the data-preparation pipeline as a strategic opportunity for eliminating unwanted bias and bolstering desirable ethical objectives. The project’s education outreach includes enhancing the understanding of ethical implications among AI students and the wider community and attracting and retraining female talents in the AI field.This award centers around the critical question: What are the fundamental downstream costs arising from fairness-unaware data preparation and how can we move toward end-to-end fairness through improved data preparation? Employing an information-theoretic lens, the PI will investigate how biased information flows from the original, dirty data to the clean training set, to the trained prediction model through the data-preparation pipeline. Specifically, the PI delves into prevalent real-world dataset problems, such as missing values, heterogeneity, and data imbalance, to examine how bias can be either amplified or mitigated while handling these issues. Motivated by this analysis, the project will devise fairness-aware data imputation, data encoding, and data-balancing techniques that can attain end-to-end ethical goals more effectively and efficiently.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在一个AI融入生活各个方面的时代,尊重道德期望的AI系统的需求从未如此重要。现代的AI算法从示例中学到的学习,更伦理的系统的创建应从为算法提供更好的示例开始。虽然收集更多质量数据通常非常昂贵,但通过在数据准备阶段做出更好的选择来提高数据质量,从而增加了最小的额外成本。这项研究偏离了当前在训练阶段考虑道德目标的重点,这仅仅是端到端数据科学生命周期的一小部分,而将数据预先准备管道作为消除不必要的偏见和不必要的道德目标的战略机会。该项目的教育宣传包括增强对AI学生和更广泛社区的道德意义的理解,并吸引和培训AI领域的女性才能。此奖项围绕一个关键问题的奖项:下游的基本下游成本是什么是由公平 - 统一数据准备以及我们如何通过改进的数据制备来实现终端的公平性? PI采用信息理论镜头,将研究如何通过数据预先准备管道从原始的,肮脏的数据到清洁训练集到经过训练的预测模型。具体而言,PI研究了普遍的现实数据集问题,例如缺失值,异质性和数据不平衡,以研究如何在处理这些问题时如何扩展或减轻偏见。在该分析的促进的过程中,该项目将设计公平意识的数据插补,数据编码和数据平衡技术,这些技术可以更有效,更有效地实现端到端的道德目标。该奖项反映了NSF的法定任务,并被认为是通过基金会的智力和更广泛的影响来通过评估来获得支持的珍贵的,以审查Criteria。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Haewon Jeong其他文献

Coded 2 . 5 D SUMMA : Coded Matrix Multiplication for High Performance Computing
编码2。
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haewon Jeong;Yaoqing Yang;Vipul Gupta;V. Cadambe;K. Ramchandran;P. Grover
  • 通讯作者:
    P. Grover
Achieving information capacity of EEG-based brain-computer interfaces using high-density EEG sensing
利用高密度脑电图传感实现基于脑电图的脑机接口的信息容量
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    P. Grover;J. Weldon;S. Kelly;Praveen Venkatesh;Haewon Jeong
  • 通讯作者:
    Haewon Jeong
An information theoretic technique for harnessing attenuation of high spatial frequencies to design ultra-high-density EEG
利用高空间频率衰减设计超高密度脑电图的信息论技术
Fully-Decentralized Coded Computing for Reliable Large-Scale Computing
  • DOI:
    10.1184/r1/12000657.v1
  • 发表时间:
    2020-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haewon Jeong
  • 通讯作者:
    Haewon Jeong
Energy-adaptive codes
能量自适应代码

Haewon Jeong的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

脏工作如何不脏?中国情境下肮脏工作的内涵结构、作用机制及干预策略研究
  • 批准号:
    71972149
  • 批准年份:
    2019
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Investigating the skin-immune system in dirty mice
研究肮脏小鼠的皮肤免疫系统
  • 批准号:
    10646827
  • 财政年份:
    2023
  • 资助金额:
    $ 55.82万
  • 项目类别:
Vaccine-Induced Mucosal T-Cell Immunity to Respiratory Viruses in Dirty Mice
疫苗诱导脏小鼠粘膜 T 细胞对呼吸道病毒的免疫
  • 批准号:
    10746925
  • 财政年份:
    2023
  • 资助金额:
    $ 55.82万
  • 项目类别:
Development of FAST-DOSE assay system for the rapid assessment of acute radiation exposure, individual radiosensitivity and injury in victims for a large-scale radiological incident
开发快速剂量测定系统,用于快速评估大规模放射事件受害者的急性辐射暴露、个体放射敏感性和损伤
  • 批准号:
    10784562
  • 财政年份:
    2023
  • 资助金额:
    $ 55.82万
  • 项目类别:
Thioredoxin, a novel agent for mitigating radiation-induced hematopoietic injury
硫氧还蛋白,一种减轻辐射引起的造血损伤的新型药物
  • 批准号:
    10687418
  • 财政年份:
    2022
  • 资助金额:
    $ 55.82万
  • 项目类别:
Integrated use of genomics, metabolomics, and cytokine profiling to validate the use of 'dirty' mice to study sepsis pathophysiology
综合使用基因组学、代谢组学和细胞因子分析来验证使用“脏”小鼠研究脓毒症病理生理学
  • 批准号:
    10257687
  • 财政年份:
    2021
  • 资助金额:
    $ 55.82万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了