权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SHF: Medium: Reasoning about Multiplicity in the Machine Learning Pipeline

SHF：Medium：机器学习管道中多重性的推理

基本信息

批准号：
2402833
负责人：
Loris DAntoni
金额：
$ 120万
依托单位：
University of Wisconsin-Madison
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-10-01 至 2027-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2402833&HistoricalAwards=false
关键词：
SHF Medium Reasoning about Multiplicity

项目摘要

Machine learning is deployed across various domains (e.g., finance, education, hiring) with the assumption that model outcomes are accurate and authoritative. But in reality, the specific model that is deployed is just one option of many: previous work has shown that multiplicity – the existence of multiple equally good models – arises at many stages of the machine learning pipeline. Formally reasoning about multiplicity is challenging due to the large (potentially infinite) set of models one has to take into account. As such, existing techniques are currently only able to reason about certain forms of model-based multiplicity, and generally only with empirical guarantees. This project’s novelties are a set of approaches that increase the auditability of machine learning pipelines. These techniques consist of frameworks and formal techniques to understand how multiplicity in the dataset creation and modeling processes impacts the final learned model that is deployed. The project’s impacts are especially prominent in domains where the decisions of machine learned models directly affect humans --- understanding multiplicity is vital for developing machine learning models that are fair and robust. The investigators are involved with organizing outreach programs to expose high schoolers and undergraduates from underrepresented backgrounds to computer science and topics in machine learning.This project investigates multiplicity for diverse model architectures across the whole machine learning pipeline including training data, model predictions, and model explanations. The research integrates formal methods and robust machine learning techniques to provide techniques to help answer the question of whether machine learning outcomes are reliable, or whether they are just an artifact of multiplicity. For instance, the investigators study algorithms to certify (deterministically or probabilistically, depending on the model architecture) whether a model’s prediction is robust under various sources of multiplicity.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习部署在各个领域（例如，金融、教育、招聘），假设模型结果是准确和权威的。但在现实中，所部署的特定模型只是众多选择中的一种：先前的研究表明，在机器学习管道的许多阶段都会出现多重性——多个同样好的模型的存在。由于必须考虑到大量（可能是无限的）模型集，因此对多样性的正式推理是具有挑战性的。因此，现有的技术目前只能对基于模型的多样性的某些形式进行推理，并且通常只能通过经验保证。这个项目的新颖之处在于一组方法，这些方法增加了机器学习管道的可审计性。这些技术由框架和形式化技术组成，用于理解数据集创建和建模过程中的多样性如何影响最终部署的学习模型。该项目的影响在机器学习模型的决策直接影响人类的领域尤为突出——理解多样性对于开发公平和健壮的机器学习模型至关重要。调查人员参与组织拓展项目，让来自代表性不足背景的高中生和本科生了解计算机科学和机器学习主题。该项目研究了整个机器学习管道中不同模型架构的多样性，包括训练数据、模型预测和模型解释。该研究整合了正式方法和强大的机器学习技术，以提供技术来帮助回答机器学习结果是否可靠，或者它们是否只是多样性的产物。例如，研究人员研究算法来证明（确定性或概率性，取决于模型架构）模型的预测是否在各种多样性来源下是鲁棒的。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。