权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Information-Theoretic Foundations of Fairness in Machine Learning

职业：机器学习公平性的信息理论基础

基本信息

批准号：
1845852
负责人：
Flavio Calmon
金额：
$ 54.79万
依托单位：
Harvard University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-02-01 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1845852&HistoricalAwards=false
关键词：
CAREER Information Theoretic Foundations Fairness

项目摘要

Machine learning algorithms can identify complex patterns in very large datasets. These algorithms are increasingly used in applications of significant social consequence, such as loan approval, hiring, and bail and sentencing decisions. However, real-world data may reflect discrimination patterns that exist in society at large. Consequently, decisions based on algorithms that learn from data are at risk of inheriting and, ultimately, reinforcing discriminatory and unfair social biases. This project aims to precisely characterize the operational limits of discrimination discovery and control in machine learning by combining legal and social science definitions of fairness with powerful mathematical tools from information theory, statistics, and optimization. This cross-disciplinary effort aims to provide fundamental theory and design guidelines for data scientists and engineers who will create the next generation of fair data-driven algorithms and applications. The technical results of this project will also inform the debate surrounding the social impact of machine learning. Moreover, this research will be used as a vessel for engaging students and researchers from diverse backgrounds in the applicability of information theory, machine learning, optimization, and, more broadly, math and engineering to social challenges.Automated methods for discovering and controlling discrimination in machine learning inherently face a trade-off between fairness and accuracy, and are limited by the dimensionality of the underlying data. This project creates a comprehensive information-theoretic framework that captures the limits of discrimination control by determining (i) how to systematically identify data features that may lead to discrimination; (ii) how to ensure fairness by producing new, information-theoretically grounded data representations; (iii) the fundamental information-theoretic trade-offs between fairness, distortion, and accuracy; and (iv) the impact of finite samples in discrimination detection and mitigation. The key advantage of the information-theoretic methodology adopted in this project is that it captures fundamental, algorithm-independent properties of discrimination, while being fertile ground for the development of novel mathematical tools and models relevant to both data scientists and information theorists. The theoretical component of this research weaves new connections between information theory and robust statistics by analyzing the impact of local perturbations of probability distributions on discrimination metrics, and creates new information-theoretic models useful in discrimination control, privacy, and representation learning. The applied component of this research develops robust, data-driven methods for measuring and mitigating discrimination that are immediately relevant for fair algorithmic decision-making in applications of consequence.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习算法可以识别非常大的数据集中的复杂模式。这些算法越来越多地用于具有重大社会影响的应用程序，例如贷款审批，招聘以及保释和判刑决定。然而，现实世界的数据可能反映了整个社会中存在的歧视模式。因此，基于从数据中学习的算法的决策有可能继承并最终加强歧视性和不公平的社会偏见。该项目旨在通过将法律的和社会科学对公平的定义与来自信息论、统计学和优化的强大数学工具相结合，精确地描述机器学习中歧视发现和控制的操作限制。这种跨学科的努力旨在为数据科学家和工程师提供基础理论和设计指南，他们将创建下一代公平的数据驱动算法和应用程序。该项目的技术成果还将为围绕机器学习的社会影响的辩论提供信息。此外，这项研究将作为一个容器，吸引来自不同背景的学生和研究人员参与信息论、机器学习、优化，以及更广泛的数学和工程对社会挑战的适用性。发现和控制机器学习中歧视的自动化方法本质上面临公平性和准确性之间的权衡，并且受到底层数据维度的限制。该项目创建了一个全面的信息理论框架，通过确定（一）如何系统地识别可能导致歧视的数据特征;（二）如何通过产生新的、基于信息理论的数据表示来确保公平;（三）公平、失真和准确性之间的基本信息理论权衡;以及（iv）有限样本在歧视检测和缓解中的影响。该项目中采用的信息理论方法的主要优势在于，它捕获了基本的、独立于算法的歧视属性，同时为开发与数据科学家和信息理论家相关的新型数学工具和模型提供了肥沃的土壤。本研究的理论部分通过分析概率分布局部扰动对歧视指标的影响，编织了信息论和鲁棒统计之间的新联系，并创建了新的信息理论模型，可用于歧视控制，隐私和表征学习。该研究的应用部分开发了强大的、数据驱动的方法，用于测量和减轻歧视，这些方法与后果应用中的公平算法决策直接相关。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（33）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Polynomial Approximations of Conditional Expectations in Scalar Gaussian Channels

标量高斯通道中条件期望的多项式逼近

DOI：
10.1109/isit45174.2021.9517932
发表时间：
2021
期刊：
2021 IEEE International Symposium on Information Theory (ISIT
影响因子：
0
作者：
Alghamdi, Wael;Calmon, Flavio P.
通讯作者：
Calmon, Flavio P.

Model Projection: Theory and Applications to Fair Machine Learning

模型投影：公平机器学习的理论与应用

DOI：
10.1109/isit44484.2020.9173988
发表时间：
2020
期刊：
Proc. of the 2020 IEEE International Symposium on Information Theory (ISIT
影响因子：
0
作者：
Alghamdi, Wael;Asoodeh, Shahab;Wang, Hao;Calmon, Flavio P.;Wei, Dennis;Ramamurthy, Karthikeyan Natesan
通讯作者：
Ramamurthy, Karthikeyan Natesan

A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f-Divergences

更好的绑定提供一百轮：通过 f-Divergences 增强隐私保证

DOI：
10.1109/isit44484.2020.9174015
发表时间：
2020
期刊：
Proc. of the 2020 IEEE International Symposium on Information Theory (ISIT
影响因子：
0
作者：
Asoodeh, Shahab;Liao, Jiachun;Calmon, Flavio P.;Kosut, Oliver;Sankar, Lalitha
通讯作者：
Sankar, Lalitha

Cactus Mechanisms: Optimal Differential Privacy Mechanisms in the Large-Composition Regime

Cactus 机制：大组合体制下的最优差分隐私机制

DOI：
10.1109/isit50566.2022.9834438
发表时间：
2022
期刊：
IEEE International Symposium on Information Theory
影响因子：
0
作者：
Alghamdi, Wael;Asoodeh, Shahab;Calmon, Flavio P.;Kosut, Oliver;Sankar, Lalitha;Wei, Fei
通讯作者：
Wei, Fei

ϵ -Approximate Coded Matrix Multiplication Is Nearly Twice as Efficient as Exact Multiplication

Ïµ - 近似编码矩阵乘法的效率几乎是精确乘法的两倍