权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SHF: Small: Collaborative Research: Programming Tools for Adaptive Data Analysis

SHF：小型：协作研究：自适应数据分析的编程工具

基本信息

批准号：
1718088
负责人：
Jonathan Ullman
金额：
$ 22.44万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-08-01 至 2020-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1718088&HistoricalAwards=false
关键词：
SHF Small Collaborative Research Programming

项目摘要

False discovery, or overfitting, occurs when an empirical researcher draws a conclusion based on a dataset that does not generalize to new data. Although there are many statistical methods for preventing false discovery, most are designed for static data analysis, where a dataset is used only once. However, modern data analysis is adaptive, and often the same datasets are reused for multiple studies by multiple researchers. Adaptivity has been identified by statisticians as one cause of non-reproducible research, and this project?s broader significance and importance will be to begin addressing this problem. Specifically, this project will build a prototype programming tool for preventing false discovery arising from adaptive data analysis. The intellectual merits are to incorporate and extend recent theoretical advances on this problem into a programming framework that allows researchers to analyze datasets adaptively with robust guarantees that overfitting will not occur.The project builds on a surprising recent connection between differential privacy and false discovery, a robust statistical guarantee that emerged recently to protect the privacy of sensitive data. This line of work shows that when data is analyzed in a differentially private way, then false discoveries cannot occur. Differential privacy is also programmable, and allows complex differentially private algorithms to be built from simple components, so it is an ideal programming framework for adaptive data analysis. This project is extending existing differentially private programming frameworks to adaptive data analysis. The PIs are also developing new algorithmic and programming languages tools for adaptive data analysis, and incorporating them into the first prototype system for this application.

错误发现或过度拟合，发生在实证研究者基于数据集得出结论时，该数据集并没有推广到新数据。虽然有许多统计方法可以防止错误发现，但大多数都是针对静态数据分析而设计的，其中数据集只使用一次。然而，现代数据分析是自适应的，并且通常相同的数据集被多个研究人员重复用于多个研究。适应性已被统计学家确定为不可重复研究的原因之一，而这个项目？更广泛的意义和重要性将是开始解决这个问题。具体而言，本项目将建立一个原型编程工具，用于防止自适应数据分析中出现的错误发现。该项目的智力优势在于将该问题的最新理论进展纳入并扩展到一个编程框架中，该框架允许研究人员自适应地分析数据集，并保证不会发生过拟合。该项目建立在差分隐私和错误发现之间令人惊讶的最新联系之上，这是一种最近出现的强大的统计保证，用于保护敏感数据的隐私。这一系列的工作表明，当数据以不同的私密方式进行分析时，就不会出现错误的发现。差分隐私也是可编程的，并且允许从简单的组件构建复杂的差分隐私算法，因此它是自适应数据分析的理想编程框架。该项目正在将现有的差异化私人编程框架扩展到适应性数据分析。 PI还在开发用于自适应数据分析的新算法和编程语言工具，并将其纳入该应用程序的第一个原型系统。

项目成果

期刊论文数量（14）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

The power of factorization mechanisms in local and central differential privacy

DOI：
10.1145/3357713.3384297
发表时间：
2019-11
期刊：
Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing
影响因子：
0
作者：
Alex Edmonds;Aleksandar Nikolov;Jonathan Ullman
通讯作者：
Alex Edmonds;Aleksandar Nikolov;Jonathan Ullman

Differentially Private Fair Learning

DOI：
发表时间：
2018-12
期刊：
ArXiv
影响因子：
0
作者：
Matthew Jagielski;Michael Kearns;Jieming Mao;Alina Oprea;Aaron Roth;Saeed Sharifi-Malvajerdi;Jonathan Ullman
通讯作者：
Matthew Jagielski;Michael Kearns;Jieming Mao;Alina Oprea;Aaron Roth;Saeed Sharifi-Malvajerdi;Jonathan Ullman

Private Identity Testing for High-Dimensional Distributions

高维分布的私有身份测试

DOI：
发表时间：
2020
期刊：
Advances in Neural Information Processing Systems
影响因子：
0
作者：
Canonne, Clement;Kamath, Guatam;McMillan, Audra;Ullman, Jonathan;Zakynthinou, Lydia
通讯作者：
Zakynthinou, Lydia

Efficient Private Algorithms for Learning Large-Margin Halfspaces

用于学习大边缘半空间的高效私有算法