Integrative Analysis on Heterogeneous Datasets with High-Dimensional and Non-Standard Models
高维非标准模型异构数据集综合分析
基本信息
- 批准号:1916271
- 负责人:
- 金额:$ 18万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Advances in data collection technology in the past decade have enabled practitioners to collect larger and more comprehensive datasets about many natural and social phenomena. Although this trend has enabled practitioners to gain new insights, it also comes with caveats that, if not addressed, may lead to erroneous conclusions that lie at the core of the reproducibility crisis in some areas of science. The caveats include: (i) Modern datasets are growing in heterogeneity, not only as a consequence of the inherent diversity in the world, but also the trend of combining data from multiple sources to create more comprehensive datasets. Not properly accounting for this growing heterogeneity may lead practitioners to systematically biased conclusions. (ii) The size of modern datasets is a hindrance to drawing inferences from them. Fitting a standard model to a massive dataset can be computationally intractable. (iii) The comprehensive nature of modern datasets raises privacy and security concerns. This is exacerbated by integrative analysis that may uncover combinations of patterns in multiple sources that are individually innocuous, but jointly identifying.The Principal Investigator aims to address the heterogeneity, size, and privacy/security concerns that arise in integrative analysis of heterogeneous datasets by designing communication avoiding methods. At a high level, the general approach is to trade local computation for communication: compute lossy summaries of each data source and perform integrative analysis on the summaries. This way, only the summaries are assembled, thereby reducing the communication costs and preserving the anonymity and security of the separate data sources. The specific aims of the PI's work include (i) effective computational strategies for distributed computing under heterogeneity in popular high-dimensional models with provable statistical guarantees, and (ii) new methodological and theoretical insights into data integration for "non-differentiable" statistical problems in which estimators are obtained by projecting either on the boundaries of a convex cone or via the optimization of discontinuous criterion functions, and which arise increasingly in modern domains of research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
过去十年数据收集技术的进步使从业人员能够收集关于许多自然和社会现象的更大和更全面的数据集。虽然这一趋势使从业者能够获得新的见解,但也有警告说,如果不加以解决,可能会导致错误的结论,这些结论是某些科学领域再现性危机的核心。警告包括:(i)现代数据集的异质性正在增加,这不仅是因为世界固有的多样性,而且是因为有将多个来源的数据结合起来创建更全面的数据集的趋势。如果不适当地解释这种日益增长的异质性,可能会导致从业者系统性地得出有偏见的结论。(ii)现代数据集的规模是从中得出推论的障碍。将标准模型拟合到海量数据集在计算上可能是难以处理的。(iii)现代数据集的综合性质引起了隐私和安全问题。这是加剧了综合分析,可能会发现在多个来源的模式,单独无害的组合,但共同identifiing.The首席研究员的目的是解决异构性,规模和隐私/安全问题,在异构数据集的综合分析中出现的设计通信避免方法。在高层次上,一般的方法是用本地计算来交换通信:计算每个数据源的有损摘要,并对摘要进行综合分析。通过这种方式,仅汇总摘要,从而降低了通信成本,并保持了单独数据源的匿名性和安全性。PI工作的具体目标包括:(i)在具有可证明统计保证的流行高维模型中,在异质性下进行分布式计算的有效计算策略,和(ii)新的方法和理论的见解,数据集成的“不可微”一种统计问题,其中的估计量是通过投影到凸锥的边界上或通过不连续准则的优化来获得的该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Circumventing superefficiency: An effective strategy for distributed computing in non-standard problems
规避超效率:非标准问题分布式计算的有效策略
- DOI:10.1214/19-ejs1559
- 发表时间:2019
- 期刊:
- 影响因子:1.1
- 作者:Banerjee, Moulinath;Durot, Cécile
- 通讯作者:Durot, Cécile
Federated Learning with Matched Averaging
- DOI:
- 发表时间:2020-02
- 期刊:
- 影响因子:0
- 作者:Hongyi Wang;M. Yurochkin;Yuekai Sun;Dimitris Papailiopoulos;Y. Khazaeni
- 通讯作者:Hongyi Wang;M. Yurochkin;Yuekai Sun;Dimitris Papailiopoulos;Y. Khazaeni
DIVIDE AND CONQUER IN NONSTANDARD PROBLEMS AND THE SUPER-EFFICIENCY PHENOMENON
- DOI:10.1214/17-aos1633
- 发表时间:2019-04-01
- 期刊:
- 影响因子:4.5
- 作者:Banerjee, Moulinath;Durot, Cecile;Sen, Bodhisattva
- 通讯作者:Sen, Bodhisattva
Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions
- DOI:
- 发表时间:2019-12
- 期刊:
- 影响因子:0
- 作者:Subha Maity;Yuekai Sun;M. Banerjee
- 通讯作者:Subha Maity;Yuekai Sun;M. Banerjee
Dirichlet Simplex Nest and Geometric Inference
- DOI:
- 发表时间:2019-05
- 期刊:
- 影响因子:0
- 作者:M. Yurochkin;Aritra Guha;Yuekai Sun;X. Nguyen
- 通讯作者:M. Yurochkin;Aritra Guha;Yuekai Sun;X. Nguyen
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yuekai Sun其他文献
Evaluating the statistical significance of biclusters
评估双簇的统计显着性
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
J. Lee;Yuekai Sun;Jonathan E. Taylor - 通讯作者:
Jonathan E. Taylor
Estimating Fréchet bounds for validating programmatic weak supervision
估计 Fréchet 界限以验证程序性弱监督
- DOI:
10.48550/arxiv.2312.04601 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Felipe Maia Polo;M. Yurochkin;Moulinath Banerjee;Subha Maity;Yuekai Sun - 通讯作者:
Yuekai Sun
Friction and adhesion properties of vertically aligned multi- walled carbon nanotube arrays and fluoro-nanodiamond films
垂直排列多壁碳纳米管阵列和氟纳米金刚石膜的摩擦和粘附性能
- DOI:
10.1016/j.carbon.2008.05.010 - 发表时间:
2008 - 期刊:
- 影响因子:10.9
- 作者:
Hao Lu;J. Goldman;F. Ding;Yuekai Sun;M. Pulikkathara;V. Khabashesku;B. Yakobson;J. Lou - 通讯作者:
J. Lou
Debiasing representations by removing unwanted variation due to protected attributes
通过消除由于受保护的属性而导致的不需要的变化来消除表示偏差
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Amanda Bower;Laura Niss;Yuekai Sun;Alexander Vargo - 通讯作者:
Alexander Vargo
Supplementary material for Dirichlet Simplex Nest and Geometric Inference
狄利克雷单纯形嵌套和几何推理的补充材料
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
M. Yurochkin;Aritra Guha;Yuekai Sun;X. Nguyen - 通讯作者:
X. Nguyen
Yuekai Sun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yuekai Sun', 18)}}的其他基金
ATD: Algorithmic Threat Detection and Mitigation with Robust Machine Learning
ATD:利用强大的机器学习进行算法威胁检测和缓解
- 批准号:
2027737 - 财政年份:2021
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
A Transfer Learning Approach to Algorithmic Fairness
算法公平性的迁移学习方法
- 批准号:
2113373 - 财政年份:2021
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
ATD: Collaborative Research: Statistically Principled Real-Time Detection of Anomalies for Temporal Network Data
ATD:协作研究:统计原理的时态网络数据异常实时检测
- 批准号:
1830247 - 财政年份:2018
- 资助金额:
$ 18万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Intelligent Patent Analysis for Optimized Technology Stack Selection:Blockchain BusinessRegistry Case Demonstration
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
基于Meta-analysis的新疆棉花灌水增产模型研究
- 批准号:41601604
- 批准年份:2016
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大规模微阵列数据组的meta-analysis方法研究
- 批准号:31100958
- 批准年份:2011
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
用“后合成核磁共振分析”(retrobiosynthetic NMR analysis)技术阐明青蒿素生物合成途径
- 批准号:30470153
- 批准年份:2004
- 资助金额:22.0 万元
- 项目类别:面上项目
相似海外基金
Risk Factor Analysis and Dynamic Response for Epidemics in Heterogeneous Populations
异质人群流行病危险因素分析及动态应对
- 批准号:
2344576 - 财政年份:2024
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
CAREER: Interpretable Provenance Analysis for Heterogeneous Systems at Scale
职业:大规模异构系统的可解释来源分析
- 批准号:
2342250 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
Operando-analysis-based design of heterogeneous catalysts for carbon neutrality
基于操作分析的碳中和多相催化剂设计
- 批准号:
23K20034 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Fund for the Promotion of Joint International Research (International Leading Research )
Nonlinear Optical Analysis of Molecular Composition and Dynamics within Heterogeneous Assemblies
异质组装体中分子组成和动力学的非线性光学分析
- 批准号:
2305178 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
AI-Powered Uncovering of Mechanisms in Cancer Through Causal Discovery Analysis and Generative Modeling of Heterogeneous Data
人工智能通过因果发现分析和异构数据生成模型揭示癌症机制
- 批准号:
10581180 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
CAREER: Interpretable Provenance Analysis for Heterogeneous Systems at Scale
职业:大规模异构系统的可解释来源分析
- 批准号:
2238847 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
Three-dimensional ultrastructural analysis of heterogeneous intercellular connections using correlative volume-imaging
使用相关体积成像对异质细胞间连接进行三维超微结构分析
- 批准号:
23K09132 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
CAREER: HeteroTime: Accelerating Static Timing Analysis with Intelligent Heterogeneous Parallelism
职业:HeteroTime:利用智能异构并行加速静态时序分析
- 批准号:
2349582 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Continuing Grant
An Explainable Unified AI Strategy for Efficient and Robust Integrative Analysis of Multi-omics Data from Highly Heterogeneous Multiple Studies
一种可解释的统一人工智能策略,用于对来自高度异质性多项研究的多组学数据进行高效、稳健的综合分析
- 批准号:
10729965 - 财政年份:2023
- 资助金额:
$ 18万 - 项目类别:
Multidimensional Heterogeneous Information Network Analysis and Mining
多维异构信息网络分析与挖掘
- 批准号:
RGPIN-2018-04495 - 财政年份:2022
- 资助金额:
$ 18万 - 项目类别:
Discovery Grants Program - Individual