AF: Medium: Collaborative Proposal: Foundations of Adaptive Data Analysis

AF:媒介:协作提案:自适应数据分析的基础

基本信息

  • 批准号:
    1763665
  • 负责人:
  • 金额:
    $ 28.6万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-03-01 至 2021-05-31
  • 项目状态:
    已结题

项目摘要

Classical tools for rigorously analyzing data make the assumption that the analysis is static: the models and the hypotheses to be tested are fixed independently of the data, and preliminary analysis of the data does not feed back into the data gathering procedure. On the other hand, modern data analysis is highly adaptive. Large parts of modern machine learning perform model selection as a function of the data by iteratively tuning hyper-parameters, and exploratory data analysis is conducted to suggest hypotheses, which are then validated on the same data sets used to discover them. This kind of adaptivity is often referred to as p-hacking, and blamed in part for the surprising prevalence of non-reproducible science in some empirical fields. This project aims to develop rigorous tools and methodologies to perform statistically valid data analysis in the adaptive setting, drawing on techniques from statistics, information theory, differential privacy, and stable algorithm design. The technical goals of this project include coming up with: 1) information-theoretic measures that characterize the degree to which a worst-case data analysis can over-fit, given an interaction with a dataset; 2) models for data analysts that move beyond the worst-case setting, and; 3) empirical investigations that bridge the gap between theory and practice. The problem of adaptive data analysis (also called post-selection inference, or selective inference) has attracted attention in both computer science and statistics over the past several years, but from relatively disjoint communities. Part of the aim of this project is to integrate these two lines of work. The team of researchers on this project span departments of computer science, statistics, and biomedical data science. In addition to attempting to unify these two areas, the broader impacts of this research will be to make science more reliable, and reduce the prevalence of "over-fitting" and "false discovery." The project also has a significant outreach and education component, and will educate graduate students, organize workshops, and produce expository materials.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
用于严格分析数据的经典工具假设分析是静态的:待检验的模型和假设是固定的,独立于数据,对数据的初步分析不会反馈到数据收集过程中。另一方面,现代数据分析具有高度的适应性。现代机器学习的大部分通过迭代调优超参数来作为数据的函数执行模型选择,并进行探索性数据分析以提出假设,然后在用于发现它们的相同数据集上验证这些假设。这种适应性通常被称为p-hacking (p-hacking),并被部分归咎于某些经验领域不可复制科学的惊人流行。本项目旨在开发严格的工具和方法,利用统计学、信息论、差分隐私和稳定算法设计等技术,在自适应环境中进行统计有效的数据分析。这个项目的技术目标包括提出:1)信息论的措施,描述最坏情况下数据分析过拟合的程度,给定与数据集的交互;2)超越最坏情况设置的数据分析师模型;3)弥合理论与实践之间差距的实证研究。自适应数据分析(也称为选择后推理或选择推理)的问题在过去几年中引起了计算机科学和统计学的关注,但来自相对分散的社区。这个项目的部分目的是整合这两条工作线。这个项目的研究团队涵盖了计算机科学、统计学和生物医学数据科学等部门。除了试图统一这两个领域之外,这项研究更广泛的影响将是使科学更加可靠,并减少“过度拟合”和“错误发现”的流行。该项目也有重要的推广和教育组成部分,将教育研究生,组织讲习班,并制作说明性材料。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
How Does Mixup Help With Robustness and Generalization?
  • DOI:
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Linjun Zhang;Zhun Deng;Kenji Kawaguchi;Amirata Ghorbani;James Y. Zou
  • 通讯作者:
    Linjun Zhang;Zhun Deng;Kenji Kawaguchi;Amirata Ghorbani;James Y. Zou
Interpreting Robust Optimization via Adversarial Influence Functions
  • DOI:
  • 发表时间:
    2020-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhun Deng;C. Dwork;Jialiang Wang;Linjun Zhang
  • 通讯作者:
    Zhun Deng;C. Dwork;Jialiang Wang;Linjun Zhang
Abstracting Fairness: Oracles, Metrics, and Interpretability
抽象公平性:预言、指标和可解释性
Composable and versatile privacy via truncated CDP
The Fienberg Problem: How to Allow Human Interactive Data Analysis in the Age of Differential Privacy
费恩伯格问题:如何在差异隐私时代进行人类交互式数据分析
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Cynthia Dwork其他文献

Distributed computing column
分布式计算专栏
  • DOI:
    10.1145/235666.235671
  • 发表时间:
    1989
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cynthia Dwork
  • 通讯作者:
    Cynthia Dwork
Complexity-Theoretic Implications of Multicalibration
多重校准的复杂性理论意义
Beyond Bernoulli: Generating Random Outcomes that cannot be Distinguished from Nature
超越伯努利:生成与自然无法区分的随机结果
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem
基于集合的提示:可证明地解决语言模型顺序依赖问题
  • DOI:
    10.48550/arxiv.2406.06581
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Reid McIlroy;Katrina Brown;Conlan Olson;Linjun Zhang;Cynthia Dwork
  • 通讯作者:
    Cynthia Dwork
Workshop Proposal: Statistical and Learning-Theoretic Challenges in Data Privacy
研讨会提案:数据隐私中的统计和学习理论挑战
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cynthia Dwork;Aleksandra Slavković
  • 通讯作者:
    Aleksandra Slavković

Cynthia Dwork的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
  • 批准号:
    2402836
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
  • 批准号:
    2402851
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Algorithms Meet Machine Learning: Mitigating Uncertainty in Optimization
协作研究:AF:媒介:算法遇见机器学习:减轻优化中的不确定性
  • 批准号:
    2422926
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Fast Combinatorial Algorithms for (Dynamic) Matchings and Shortest Paths
合作研究:AF:中:(动态)匹配和最短路径的快速组合算法
  • 批准号:
    2402283
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
  • 批准号:
    2402852
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Fast Combinatorial Algorithms for (Dynamic) Matchings and Shortest Paths
合作研究:AF:中:(动态)匹配和最短路径的快速组合算法
  • 批准号:
    2402284
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
  • 批准号:
    2402837
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
  • 批准号:
    2402835
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Adventures in Flatland: Algorithms for Modern Memories
合作研究:AF:媒介:平地历险记:现代记忆算法
  • 批准号:
    2423105
  • 财政年份:
    2024
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
Collaborative Research: AF: Medium: Sketching for privacy and privacy for sketching
合作研究:AF:中:为隐私而素描和为素描而隐私
  • 批准号:
    2311649
  • 财政年份:
    2023
  • 资助金额:
    $ 28.6万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了