权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Bayesian Sparse Dirichlet-Multinomial Models for Discovering Latent Structure in High-Dimensional Compositional Count Data

用于发现高维组合计数数据中潜在结构的贝叶斯稀疏狄利克雷多项模型

基本信息

批准号：
2245492
负责人：
Matthew Koslovsky
金额：
$ 16.5万
依托单位：
Colorado State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-09-01 至 2026-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2245492&HistoricalAwards=false
关键词：
Bayesian Sparse Dirichlet Multinomial Models

项目摘要

The collection and analysis of microbiome data have broad implications for furthering our understanding of human health and performance, agriculture, and ecology, among other areas. Human microbiome research, for example, aims to better understand the role of our microbial communities and how they interact with their host, respond to their environment, and influence disease. In addition to microbiome data being compositional, as the sum of the microbial taxa reads is fixed, and high-dimensional, they are also zero-inflated, as there are typically more zero reads observed than expected, which has profound implications on modeling and inference. This project aims to advance statistical methods and computational algorithms for the analysis of zero-inflated multivariate compositional count data. While developed to address the current challenges of microbiome data analysis, the methods will be generally applicable to other settings in which multivariate compositional count data with excess zeros are observed, including biomedical and public health research, econometrics, and ecology. The project will additionally provide educational and professional training and mentoring to graduate students.Analyzing multivariate count data generated by high-throughput sequencing technology in omics research is challenging due to the high-dimensional and compositional structure of the data, over-dispersion, and potential zero inflation. In practice, researchers often use the Dirichlet-multinomial (DM) distribution and its variants to model these data. However, under the assumptions of a DM model, estimated probabilities for zero counts are strictly positive even if the true probability of occurrence is zero. This research project aims to develop a novel sparse DM (sDM) model which allows zero count probabilities to take on zero values to simultaneously accommodate potential zero inflation in multivariate compositional count data while estimating compositional probabilities. Additionally, this project will investigate extensions of the sDM modeling framework to high-dimensional variable selection and clustering problems and contribute Markov chain Monte Carlo algorithms for posterior inference that will be made publicly available to practitioners and other researchers.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

微生物组数据的收集和分析对我们进一步了解人类健康和性能、农业和生态等领域具有广泛的意义。例如，人类微生物组研究的目的是更好地了解我们的微生物群落的作用，以及它们如何与宿主相互作用，对环境作出反应，并影响疾病。微生物组数据除了具有组成性外，由于微生物类群reads的总和是固定的、高维的，因此它们也是零膨胀的，因为观察到的零reads通常比预期的要多，这对建模和推理具有深远的意义。本项目旨在推进零膨胀多元成分计数数据分析的统计方法和计算算法。虽然开发这些方法是为了解决当前微生物组数据分析的挑战，但这些方法将普遍适用于其他环境，包括生物医学和公共卫生研究、计量经济学和生态学，其中观察到多变量组成计数数据有多余的零。该项目还将为研究生提供教育和专业培训和指导。由于数据的高维和组成结构、过度分散和潜在的零膨胀，在组学研究中分析高通量测序技术产生的多变量计数数据具有挑战性。在实践中，研究人员经常使用Dirichlet-multinomial （DM）分布及其变体来模拟这些数据。然而，在DM模型的假设下，即使发生的真实概率为零，零计数的估计概率也严格为正。本研究项目旨在开发一种新的稀疏DM （sDM）模型，该模型允许零计数概率取零值，同时适应多元成分计数数据中潜在的零膨胀，同时估计成分概率。此外，该项目将研究sDM建模框架对高维变量选择和聚类问题的扩展，并为后验推理提供马尔可夫链蒙特卡罗算法，这些算法将公开提供给从业者和其他研究人员。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。