BIGDATA: F: Reliable Inference with Big Data: Reproducibility, Data Sharing, Heterogeneity

BIGDATA:F:大数据的可靠推理:再现性、数据共享、异构性

基本信息

  • 批准号:
    1741162
  • 负责人:
  • 金额:
    $ 65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

Over the last decade, 'big data' technologies have allowed the acquisition of vast amount of data (e.g. through smartphones) and their accumulation into large scale databases. Powerful hardware and software systems have been developed to crunch these data and extract statistical models. For instance, the outcome of a certain medical procedure can be modeled in terms of the features of the patient, thus in principle providing a personalized risk score for that procedure. Unfortunately, the increasing complexity of these data and of the algorithms used has made statistical models significantly less transparent. How certain are we of these statistical predictions? What is their limit of validity? How biased is the resulting model?This project focuses on four main challenges that are ubiquitous in big-data, and are crucial to extract reliable insights: reproducibility; data sharing; missing data; data heterogeneity. (1) Reproducibility requires being able to compare two models extracted from different data sets (e.g. after additional data have been accumulated). This is in turn impossible unless we have reliable procedures to quantify uncertainty and confidence in complex high-dimensional models. Recently proposed ideas in this direction are still insufficient to cope with realistic large-scale applications.(2) Data sharing is a key feature of modern data analysis, whereby a single massive data set is being studied by hundreds of independent researchers. Unguarded statistical inference by such a population of researchers unavoidably leads to large numbers of false discoveries. The project builds on false discovery rate-controlling methods to propose safe approaches for decentralized data analysis.(3) Missing data are ubiquitous in big data. While several methods have been developed in the past to deal with missing data, it is unclear to what extent they are applicable to modern scenarios. The project aims at developing principled guidelines based on a rigorous comparison of various approaches, and developing new algorithms based on maximum likelihood.(4) Data heterogeneity. Big data are often produced by the aggregation of multiple data sources. How can we prevent standard statistical procedures to be critically affected by such heterogeneities? The project uses new regularization schemes to fusion information across multiple sources.
在过去的十年中,“大数据”技术已经允许获取大量数据(例如通过智能手机)并将其积累到大规模数据库中。强大的硬件和软件系统已经被开发出来来处理这些数据并提取统计模型。例如,可以根据患者的特征对某个医疗程序的结果进行建模,从而原则上为该程序提供个性化的风险评分。不幸的是,这些数据和所用算法的日益复杂性使统计模型的透明度大大降低。我们对这些统计预测有多大把握?它们的有效期限是什么?结果模型的偏差有多大?该项目重点关注大数据中普遍存在的四个主要挑战,这些挑战对于提取可靠的见解至关重要:再现性;数据共享;缺失数据;数据异质性。(1)复制需要能够比较从不同数据集提取的两个模型(例如,在积累了额外的数据之后)。这反过来是不可能的,除非我们有可靠的程序来量化复杂的高维模型的不确定性和信心。最近提出的想法在这个方向上仍然不足以科普现实的大规模应用。(2)数据共享是现代数据分析的一个关键特征,数百名独立研究人员正在研究单个海量数据集。这样一群研究人员毫无防备的统计推断必然会导致大量的错误发现。该项目建立在错误发现率控制方法的基础上,为分散式数据分析提出了安全的方法。(3)缺失数据在大数据中无处不在。虽然过去已经开发了几种方法来处理缺失数据,但尚不清楚它们在多大程度上适用于现代情景。该项目旨在根据对各种方法的严格比较制定原则性准则,并根据最大似然法开发新算法。(4)数据异质性。大数据通常由多个数据源的聚合产生。我们怎样才能防止标准统计程序受到这种异质性的严重影响?该项目使用新的正则化方案来融合多个来源的信息。

项目成果

期刊论文数量(28)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Learning with invariances in random features and kernel models
  • DOI:
  • 发表时间:
    2021-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Song Mei;Theodor Misiakiewicz;A. Montanari
  • 通讯作者:
    Song Mei;Theodor Misiakiewicz;A. Montanari
Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”
  • DOI:
    10.1214/19-aos1910
  • 发表时间:
    2020-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    B. Ghorbani;Song Mei;Theodor Misiakiewicz;A. Montanari
  • 通讯作者:
    B. Ghorbani;Song Mei;Theodor Misiakiewicz;A. Montanari
When do neural networks outperform kernel methods?
Streaming Belief Propagation for Community Detection
  • DOI:
  • 发表时间:
    2021-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuchen Wu;M. Bateni;André Linhares;Filipe Almeida;A. Montanari;A. Norouzi-Fard;Jakab Tardos
  • 通讯作者:
    Yuchen Wu;M. Bateni;André Linhares;Filipe Almeida;A. Montanari;A. Norouzi-Fard;Jakab Tardos
Optimization of the Sherrington--Kirkpatrick Hamiltonian
Sherrington--Kirkpatrick 哈密顿量的优化
  • DOI:
    10.1137/20m132016x
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Montanari, Andrea
  • 通讯作者:
    Montanari, Andrea
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Andrea Montanari其他文献

A sensor-based study on the environmental determinants of sleep in older adults
一项基于传感器的关于老年人睡眠环境决定因素的研究
  • DOI:
    10.1016/j.envres.2025.120874
  • 发表时间:
    2025-06-01
  • 期刊:
  • 影响因子:
    7.700
  • 作者:
    Andrea Montanari;Giovanna Fancello;Cédric Sueur;Yan Kestens;Frank J. van Lenthe;Basile Chaix
  • 通讯作者:
    Basile Chaix
Understanding Inverse Scaling and Emergence in Multitask Representation Learning
了解多任务表示学习中的逆缩放和涌现
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. E. Ildiz;Zhe Zhao;Samet Oymak;Xiangyu Chang;Yingcong Li;Christos Thrampoulidis;Lin Chen;Yifei Min;Mikhail Belkin;Aakanksha Chowdhery;Sharan Narang;Jacob Devlin;Maarten Bosma;Gaurav Mishra;Adam Roberts;Liam Collins;Hamed Hassani;M. Soltanolkotabi;Aryan Mokhtari;Sanjay Shakkottai;Provable;Simon S. Du;Wei Hu;S. Kakade;Chelsea Finn;A. Rajeswaran;Deep Ganguli;Danny Hernandez;Liane Lovitt;Amanda Askell;Yu Bai;Anna Chen;Tom Conerly;Nova Dassarma;Dawn Drain;Sheer Nelson El;El Showk;Stanislav Fort;Zac Hatfield;T. Henighan;Scott Johnston;Andy Jones;Nicholas Joseph;Jackson Kernian;Shauna Kravec;Benjamin Mann;Neel Nanda;Kamal Ndousse;Catherine Olsson;D. Amodei;Tom Brown;Jared Ka;Sam McCandlish;Chris Olah;Dario Amodei;Trevor Hastie;Andrea Montanari;Saharon Rosset;Jordan Hoffmann;Sebastian Borgeaud;A. Mensch;Elena Buchatskaya;Trevor Cai;Eliza Rutherford;Diego de;Las Casas;Lisa Anne Hendricks;Johannes Welbl;Aidan Clark;Tom Hennigan;Eric Noland;Katie Millican;George van den Driessche;Bogdan Damoc;Aurelia Guy;Simon Osindero;Karen Si;Erich Elsen;Jack W. Rae;O. Vinyals;Jared Kaplan;B. Chess;R. Child;S. Gray;Alec Radford;Jeffrey Wu;I. R. McKenzie;Alexander Lyzhov;Michael Pieler;Alicia Parrish;Aaron Mueller;Ameya Prabhu;Euan McLean;Aaron Kirtland;Alexis Ross;Alisa Liu;Andrew Gritsevskiy;Daniel Wurgaft;Derik Kauff;Gabriel Recchia;Jiacheng Liu;Joe Cavanagh;Tom Tseng;Xudong Korbak;Yuhui Shen;Zhengping Zhang;Najoung Zhou;Samuel R Kim;Bowman Ethan;Perez;Feng Ruan;Youngtak Sohn
  • 通讯作者:
    Youngtak Sohn
Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition
通过测量分解进行稀疏线性回归的可证明有效的后验采样
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrea Montanari;Yuchen Wu
  • 通讯作者:
    Yuchen Wu
Optimization of random cost functions and statistical physics
  • DOI:
  • 发表时间:
    2024-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrea Montanari
  • 通讯作者:
    Andrea Montanari
Tractability from overparametrization: the example of the negative perceptron
过度参数化的可处理性:负感知器的例子
  • DOI:
    10.1007/s00440-023-01248-y
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Andrea Montanari;Yiqiao Zhong;Kangjie Zhou
  • 通讯作者:
    Kangjie Zhou

Andrea Montanari的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Andrea Montanari', 18)}}的其他基金

CIF: Small: Learning and estimation with rough non-convex objectives: Fundamental limits and efficient algorithms
CIF:小:具有粗略非凸目标的学习和估计:基本限制和高效算法
  • 批准号:
    2006489
  • 财政年份:
    2020
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Workshop: Advances in Asymptotic Probability
研讨会:渐近概率的进展
  • 批准号:
    1839440
  • 财政年份:
    2018
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CIF:Small:Information-theoretic and Computational Thresholds in Statistical Learning
CIF:小:统计学习中的信息理论和计算阈值
  • 批准号:
    1714305
  • 财政年份:
    2017
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CIF: Small: Optimal Iterative Estimation in Signal Processing, Information Theory and Machine Learning
CIF:小:信号处理、信息论和机器学习中的最优迭代估计
  • 批准号:
    1319979
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
The game dynamics of social interaction: Algorithms and applications
社交互动的博弈动力学:算法与应用
  • 批准号:
    0915145
  • 财政年份:
    2009
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CAREER: New Information Processing Techniques from Statistical Physics and Probability Theory
职业:统计物理学和概率论的新信息处理技术
  • 批准号:
    0743978
  • 财政年份:
    2008
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant

相似海外基金

Enabling Reliable Testing Of SMLM Datasets
实现 SMLM 数据集的可靠测试
  • 批准号:
    BB/X01858X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Research Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CRII: RI: Deep neural network pruning for fast and reliable visual detection in self-driving vehicles
CRII:RI:深度神经网络修剪,用于自动驾驶车辆中快速可靠的视觉检测
  • 批准号:
    2412285
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
RITA: Reliable and Efficient Task Management in Edge Computing for AIoT Systems
RITA:AIoT 系统边缘计算中可靠、高效的任务管理
  • 批准号:
    EP/Y015886/1
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Fellowship
A Novel Contour-based Machine Learning Tool for Reliable Brain Tumour Resection (ContourBrain)
一种基于轮廓的新型机器学习工具,用于可靠的脑肿瘤切除(ContourBrain)
  • 批准号:
    EP/Y021614/1
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Research Grant
Economic & Reliable DC Microgrids
经济的
  • 批准号:
    EP/Y034619/1
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Fellowship
CAREER: Graded and Reliable Aerosol Deposition for Electronics (GRADE): Understanding Multi-Material Aerosol Jet Printing with In-Line Mixing
职业:电子产品的分级且可靠的气溶胶沉积 (GRADE):了解通过在线混合进行多材料气溶胶喷射打印
  • 批准号:
    2336356
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
STTR Phase I: A Reliable and Efficient New Method for Satellite Attitude Control
STTR第一阶段:可靠、高效的卫星姿态控制新方法
  • 批准号:
    2310323
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Towards an Explainable, Efficient, and Reliable Federated Learning Framework: A Solution for Data Heterogeneity
迈向可解释、高效、可靠的联邦学习框架:数据异构性的解决方案
  • 批准号:
    24K20848
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
CAREER: Speedy and Reliable Approximate Queries in Hybrid Transactional/Analytical Systems
职业:混合事务/分析系统中快速可靠的近似查询
  • 批准号:
    2339596
  • 财政年份:
    2024
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了