Statistical and Computational Tools for Analyzing High-Dimensional Heterogeneous Data

用于分析高维异构数据的统计和计算工具

基本信息

  • 批准号:
    2210907
  • 负责人:
  • 金额:
    $ 18万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-01 至 2025-07-31
  • 项目状态:
    未结题

项目摘要

Modern technologies generate tremendous volumes of data in diverse forms. The high throughput data come inevitably with great heterogeneity and enormous amount of noise. For instance, a large-scale genetic study typically involves people with various different attributes; a social network usually consists of multiple hidden communities with denser internal connections compared to external ones. While the raw features have high ambient dimension (for example, thousands of genes), oftentimes the intrinsic structures exhibit low complexity (for example, latitude and longitude of an individual’s geographic location). Precise extraction of the latent structure paves the way for solving downstream tasks. Faced with the significant challenges in statistics and computation, this project aims to develop efficient methodologies for estimating and inferring latent structures from heterogeneous data. This project will yield cutting-edge tools for scientific study, open-source software for easy implementation, and new mathematical theorems for theoretical analysis. The project will also provide numerous opportunities for statistical education and research training.The project is structured into three parts. In the first part, the goal is to develop a new flexible methodology for clustering high-dimensional data. This part aims at new algorithms that can identify non-spherical and even non-convex clusters. An in-depth analysis of mixture models brings theoretical insights including tight finite-sample statistical error bounds and finite-iteration convergence guarantees for computation. In the second part, the goal is to study heterogeneous relational data that encode the information of individual objects in their pairwise relations. This part yields reliable methods for estimating and testing latent structures in the realistic scenario where the partially observed data may not be uniformly sampled at random. Finally, the third part focuses on the joint analysis of multiple related datasets, such as social networks with high-dimensional personal attributes. The tools developed in the first two parts of the project will constitute fundamental building blocks to address the research goals of this last thrust. The research finding will provide novel efficient data integration strategies for enhanced statistical accuracy. The proposed research initiatives include dissemination of the new methods and algorithms in a form of publicly available software and an active agenda on enhancing interdisciplinary research training and enhancing diversity in statistical sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代技术以各种形式产生大量数据。高吞吐量的数据不可避免地伴随着巨大的异构性和大量的噪声。例如,大规模的遗传研究通常涉及具有各种不同属性的人;社交网络通常由多个隐藏的社区组成,与外部联系相比,内部联系更加紧密。虽然原始特征具有高环境维度(例如,数千个基因),但通常内在结构表现出低复杂性(例如,个体地理位置的纬度和经度)。对潜在结构的精确提取为解决下游任务铺平了道路。面对统计和计算方面的重大挑战,本项目旨在开发有效的方法,从异构数据中估计和推断潜在结构。该项目将产生用于科学研究的尖端工具,易于实施的开源软件,以及用于理论分析的新数学定理。该项目还将为统计教育和研究培训提供大量机会。在第一部分中,目标是开发一种新的灵活的方法来聚类高维数据。这一部分的目的是新的算法,可以识别非球形,甚至非凸集群。对混合模型的深入分析带来了理论见解,包括严格的有限样本统计误差界和有限迭代收敛保证计算。在第二部分中,目标是研究异构关系数据,这些数据将单个对象的信息编码在它们的成对关系中。这一部分产生了可靠的方法,估计和测试潜在的结构在现实的情况下,部分观察到的数据可能不会均匀随机抽样。最后,第三部分着重于多个相关数据集的联合分析,如具有高维个人属性的社交网络。该项目前两部分开发的工具将构成实现最后一个重点研究目标的基本构件。研究结果将为提高统计准确性提供新的有效的数据集成策略。拟议的研究举措包括以公开软件的形式传播新方法和算法,以及加强跨学科研究培训和加强统计科学多样性的积极议程,该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Kaizheng Wang其他文献

Implicit Regularization in Nonconvex Statistical Estimation
非凸统计估计中的隐式正则化
Design, synthesis, biological evaluation of urea substituted 1,2,5-oxadiazole-3-carboximidamides as novel indoleamine 2,3-dioxygenase-1 (IDO1) inhibitors Author links open overlay panel
  • DOI:
    https://doi.org/10.1016/j.ejmech.2023.115217
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
  • 作者:
    Ke Ye;Kaizheng Wang;Tianyu Wang;He Tang;Lin Wang;Wanheng Zhang;Sheng Jiang;Xiangyu Zhang;Kuojun Zhang
  • 通讯作者:
    Kuojun Zhang
A handheld isothermal fluorescence detector for duplex visualization of aquatic pathogens via enhanced one-pot LAMP-PfAgo assay
  • DOI:
    10.1016/j.bios.2024.116187
  • 发表时间:
    2024-06-15
  • 期刊:
  • 影响因子:
  • 作者:
    Feibiao Pang;Tao Zhang;Fengyi Dai;Kaizheng Wang;Tianjiao Jiao;Zuoying Zhang;Liyi Zhang;Mingli Liu;Peng Hu;Jinzhao Song
  • 通讯作者:
    Jinzhao Song
Numerical Model of Two-Phase Gas–Liquid Streamer Discharge in Ester-Based Insulating Oil Under Impulse Voltage
脉冲电压下酯基绝缘油两相气液流光放电数值模型
  • DOI:
    10.1109/tps.2024.3374091
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Kaizheng Wang;Shuaiqi Wang;Ruilong Yu;Feipeng Wang;Shunzhen Zhou
  • 通讯作者:
    Shunzhen Zhou
Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient Flow
使用 Wasserstein-Fisher-Rao 梯度流学习高斯混合物
  • DOI:
    10.48550/arxiv.2301.01766
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuling Yan;Kaizheng Wang;P. Rigollet
  • 通讯作者:
    P. Rigollet

Kaizheng Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

New statistical and computational tools for optimization of planarian behavioral chemical screens
用于优化涡虫行为化学筛选的新统计和计算工具
  • 批准号:
    10658688
  • 财政年份:
    2023
  • 资助金额:
    $ 18万
  • 项目类别:
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
  • 批准号:
    RGPIN-2019-05688
  • 财政年份:
    2022
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
  • 批准号:
    RGPIN-2018-05147
  • 财政年份:
    2022
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
  • 批准号:
    RGPIN-2019-05688
  • 财政年份:
    2021
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
CAREER: Statistical approaches and computational tools for analyzing spatially-resolved single-cell transcriptomics data
职业:用于分析空间分辨单细胞转录组数据的统计方法和计算工具
  • 批准号:
    2047611
  • 财政年份:
    2021
  • 资助金额:
    $ 18万
  • 项目类别:
    Continuing Grant
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
  • 批准号:
    RGPIN-2018-05147
  • 财政年份:
    2021
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
  • 批准号:
    RGPAS-2019-00092
  • 财政年份:
    2020
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
  • 批准号:
    RGPIN-2019-05688
  • 财政年份:
    2020
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
  • 批准号:
    RGPIN-2018-05147
  • 财政年份:
    2020
  • 资助金额:
    $ 18万
  • 项目类别:
    Discovery Grants Program - Individual
Computational Efficient Statistical Tools for Analyzing Substance Dependence Sequencing Data
用于分析物质依赖性测序数据的高效计算统计工具
  • 批准号:
    9922519
  • 财政年份:
    2019
  • 资助金额:
    $ 18万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了