Bayesian Sparse Dirichlet-Multinomial Models for Discovering Latent Structure in High-Dimensional Compositional Count Data
用于发现高维组合计数数据中潜在结构的贝叶斯稀疏狄利克雷多项模型
基本信息
- 批准号:2245492
- 负责人:
- 金额:$ 16.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-01 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The collection and analysis of microbiome data have broad implications for furthering our understanding of human health and performance, agriculture, and ecology, among other areas. Human microbiome research, for example, aims to better understand the role of our microbial communities and how they interact with their host, respond to their environment, and influence disease. In addition to microbiome data being compositional, as the sum of the microbial taxa reads is fixed, and high-dimensional, they are also zero-inflated, as there are typically more zero reads observed than expected, which has profound implications on modeling and inference. This project aims to advance statistical methods and computational algorithms for the analysis of zero-inflated multivariate compositional count data. While developed to address the current challenges of microbiome data analysis, the methods will be generally applicable to other settings in which multivariate compositional count data with excess zeros are observed, including biomedical and public health research, econometrics, and ecology. The project will additionally provide educational and professional training and mentoring to graduate students.Analyzing multivariate count data generated by high-throughput sequencing technology in omics research is challenging due to the high-dimensional and compositional structure of the data, over-dispersion, and potential zero inflation. In practice, researchers often use the Dirichlet-multinomial (DM) distribution and its variants to model these data. However, under the assumptions of a DM model, estimated probabilities for zero counts are strictly positive even if the true probability of occurrence is zero. This research project aims to develop a novel sparse DM (sDM) model which allows zero count probabilities to take on zero values to simultaneously accommodate potential zero inflation in multivariate compositional count data while estimating compositional probabilities. Additionally, this project will investigate extensions of the sDM modeling framework to high-dimensional variable selection and clustering problems and contribute Markov chain Monte Carlo algorithms for posterior inference that will be made publicly available to practitioners and other researchers.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
微生物组数据的收集和分析对于加深我们对人类健康和性能、农业和生态学等领域的理解具有广泛的影响。例如,人类微生物组研究的目的是更好地了解我们的微生物群落的作用,以及它们如何与宿主相互作用,对环境做出反应,并影响疾病。除了微生物组数据是组成的,因为微生物分类群读数的总和是固定的和高维的,它们也是零膨胀的,因为观察到的零读数通常比预期的多,这对建模和推理有深远的影响。这个项目的目的是改进统计方法和计算算法,用于分析零膨胀的多变量成分计数数据。虽然开发这些方法是为了解决目前微生物组数据分析的挑战,但这些方法将普遍适用于观察到多变量成分计数数据超过零的其他环境,包括生物医学和公共卫生研究、计量经济学和生态学。该项目还将为研究生提供教育和专业培训和指导。由于数据的高维和组成结构、过度分散和潜在的零膨胀,在组学研究中分析高通量测序技术产生的多变量计数数据具有挑战性。在实践中,研究人员经常使用狄利克雷多项分布(DM)及其变种来对这些数据进行建模。然而,在DM模型的假设下,即使真实发生概率为零,零计数的估计概率也是严格正的。该研究项目旨在开发一种新的稀疏数据挖掘(SDM)模型,该模型允许零计数概率取零,以同时适应多变量成分计数数据中潜在的零膨胀,同时估计成分概率。此外,该项目将研究SDM建模框架对高维变量选择和聚类问题的扩展,并为后验推断贡献马尔科夫链蒙特卡罗算法,这些算法将公开提供给从业者和其他研究人员。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Matthew Koslovsky其他文献
Matthew Koslovsky的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
基于Sparse-Land模型的SAR图像噪声抑制与分割
- 批准号:60971128
- 批准年份:2009
- 资助金额:30.0 万元
- 项目类别:面上项目
相似海外基金
The Global Structure of Sparse Networks
稀疏网络的全局结构
- 批准号:
DP240100198 - 财政年份:2024
- 资助金额:
$ 16.5万 - 项目类别:
Discovery Projects
CAREER: Compiler and Runtime Support for Sampled Sparse Computations on Heterogeneous Systems
职业:异构系统上采样稀疏计算的编译器和运行时支持
- 批准号:
2338144 - 财政年份:2024
- 资助金额:
$ 16.5万 - 项目类别:
Continuing Grant
ERI: AI-Enhanced Dynamic Interference Suppression in Cognitive Sensing with Reconfigurable Sparse Arrays
ERI:利用可重构稀疏阵列在认知传感中进行人工智能增强型动态干扰抑制
- 批准号:
2347220 - 财政年份:2024
- 资助金额:
$ 16.5万 - 项目类别:
Standard Grant
CIF:Small:Learning Sparse Vector and Matrix Graphs from Time-Dependent Data
CIF:小:从瞬态数据中学习稀疏向量和矩阵图
- 批准号:
2308473 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Standard Grant
Sparse Sensor Array Design and Processing
稀疏传感器阵列设计与处理
- 批准号:
2236023 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Standard Grant
CAREER: Physics-inspired Machine Learning with Sparse and Asynchronous p-bits
职业:利用稀疏和异步 p 位进行物理启发的机器学习
- 批准号:
2237357 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Continuing Grant
Inverting turbulence: flow patterns and parameters from sparse data
反演湍流:来自稀疏数据的流动模式和参数
- 批准号:
EP/X017273/1 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Research Grant
Creating digital twins of flows from noisy and sparse flow-MRI data
从嘈杂和稀疏的流 MRI 数据创建流的数字孪生
- 批准号:
EP/X028232/1 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Fellowship
Realization of sparse control with model predictive control and guarantee of its performance
模型预测控制稀疏控制的实现及其性能保证
- 批准号:
23K03916 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Bayesian Learning for Sparse High-Dimensional Data
稀疏高维数据的贝叶斯学习
- 批准号:
2889818 - 财政年份:2023
- 资助金额:
$ 16.5万 - 项目类别:
Studentship