Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data
单细胞数据的贝叶斯差分因果网络和聚类方法
基本信息
- 批准号:10592720
- 负责人:
- 金额:$ 30.44万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-21 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:AddressBayesian ModelingBiologicalCell Differentiation processCellsComparative StudyDataDependenceDevelopmentEtiologyFeedbackGene Expression RegulationGenesInterventionJointsKnowledgeMeasuresMethodsModelingMolecularNational Institute of General Medical SciencesNatureNeighborhoodsPathologicPreventionProceduresRegulator GenesResearchSample SizeSamplingTechnologyTestingTimeTissuesTranslatingUncertaintyWorkbasecausal variantcell typedifferential expressiondisease diagnosisexperimental groupgene regulatory networkimprovednetwork modelsnon-Gaussian modelsingle-cell RNA sequencingtooltranscriptome sequencingtreatment group
项目摘要
Project Description
DMS/NIGMS 2: Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data
A Significance
A.1 Importance of the Problem to Be Addressed
Single-cell RNA-sequencing (scRNA-seq) technologies have facilitated new biological discoveries that were
impossible with bulk RNA-seq, such as discovering at the single-cell level new gene regulatory activities and
cell types. However, in order to translate the fundamental biological knowledge advanced by the scRNA-
seq to improved disease diagnosis, treatment, and prevention, new methods are required to comparatively
study the molecular differences between normal and pathological cells/tissues, and between control and
case/treatment groups. Although identification of differentially expressed genes across two sample groups
has been extensively studied, to date, the vast majority of the existing methods for identifying gene regu-
latory networks (GRNs) and cell types have, so far, focused on scRNA-seq data generated under a single
experimental condition. In principle, these methods can be applied to one experimental condition at a time,
based on which post hoc comparisons can be made in order to find the differences caused by experimental
interventions. However, compared to joint modeling approaches, this two-step procedure is deemed less
efficient and more susceptible to false discoveries due to lack of proper uncertainty propagation from the
first step to the second. Moreover, most scRNA-seq network models are correlative in nature and do not
infer causal gene regulatory relationships. There is, therefore, a critical need to develop new models for
identifying the effects of experimental interventions on causal gene regulation and cell composition by jointly
modeling scRNA-seq data across experimental groups. In the absence of such tools, mechanistically un-
derstanding gene regulation and cell differentiation, and fully realizing the translational values of scRNA-seq
studies will likely remain difficult.
A.2 Rigor of Prior Research
Aim 1. Many existing scRNA-seq network approaches adapt standard association measures to zero-
inflated scRNA-seq data, e.g. Pearson correlation [1] and mutual information [2]. A common limitation
of these methods is that they only quantify marginal dependencies, which is susceptible to spurious indirect
associations [3]. Graphical models which deal with conditional associations are powerful alternatives to
the marginal association measures. Numerous methods have been proposed for general purposes [4, 5]
including the development on non-Gaussian data [6–9]. Specifically for scRNA-seq data, two undirected
graphical models including Co-I Cai's work [10, 11] were recently proposed based on neighborhood selec-
tion which, however, do not infer causal gene regulation. To identify causal relationships, several alternative
methods [12, 13] were developed. However, these methods either ignore the count nature of scRNA-seq
data, require a known pseudotime (which is rarely known in real scRNA-seq data), or do not theoretically in-
vestigate causal identifiability for cross-sectional observations. For differential networks, many approaches
[14–18] including the PI's prior work [19] have been developed for bulk RNA-seq data which showed great
advantages of joint analyses over independent analyses. However, there exist much fewer differential net-
work methods for scRNA-seq data, e.g., PT [20] and scdNet [21] . The common limitation of PT and
scdNet is that they only consider marginal dependence (hence susceptible to false discoveries) and do not
discover causality. Results from our preliminary results (§C.1) demonstrate that the proposed Bayesian
network model is capable of identifying causal gene regulatory relationships in cross-sectional scRNA-seq
data and often outperforms the state-of-the-art alternative methods.
Aim 2. Very few methods are available to construct cell-specific networks because it is difficult to estimate
networks with, in essence, sample size one. Recently, a hypothesis testing approach [22] was developed
to estimate cell-specific networks. The method makes approximate network inference of each cell based
on its neighbors. However, it only considers symmetric (undirected) marginal dependence, and therefore
cannot infer causal regulatory relationships and is susceptible to spurious associations. The PI's prior work
[23] addressed the "sample-size-one" problem in bulk RNA-seq data assuming the causal networks are
smooth functions of additional covariates. However, the method is not applicable without covariates and
does not allow feedback loops, a common motif in GRN. Existing work [24, 25] including the PI's [19] has
1
项目描述
DMS/NIGMS 2:单细胞数据的贝叶斯差分因果网络和聚类方法
意义
A.1 待解决问题的重要性
单细胞 RNA 测序 (scRNA-seq) 技术促进了新的生物学发现
批量 RNA-seq 是不可能实现的,例如在单细胞水平上发现新的基因调控活性,
细胞类型。然而,为了转化 scRNA 先进的基础生物学知识,
为了改善疾病的诊断、治疗和预防,需要新的方法来相对
研究正常和病理细胞/组织之间以及对照和组织之间的分子差异
病例/治疗组。尽管鉴定了两个样本组之间的差异表达基因
迄今为止,绝大多数现有的识别基因调控的方法已经得到了广泛的研究。
迄今为止,实验室网络(GRN)和细胞类型主要关注在单一网络下生成的 scRNA-seq 数据。
实验条件。原则上,这些方法一次可以应用于一种实验条件,
在此基础上可以进行事后比较,以发现实验造成的差异
干预措施。然而,与联合建模方法相比,这种两步过程被认为更少
由于缺乏适当的不确定性传播,效率更高并且更容易受到错误发现的影响
第一步到第二步。此外,大多数 scRNA-seq 网络模型本质上是相关的,并不相互关联。
推断因果基因调控关系。因此,迫切需要开发新的模型
通过联合确定实验干预对因果基因调控和细胞组成的影响
跨实验组的 scRNA-seq 数据建模。如果没有这样的工具,机械地解除
了解基因调控和细胞分化,充分发挥scRNA-seq的翻译价值
研究可能仍然很困难。
A.2 先前研究的严谨性
目标 1. 许多现有的 scRNA-seq 网络方法采用标准关联测量来实现零-
夸大的 scRNA-seq 数据,例如Pearson 相关性 [1] 和互信息 [2]。一个常见的限制
这些方法的一个特点是它们仅量化边际依赖性,这很容易受到虚假间接的影响
协会[3]。处理条件关联的图形模型是强大的替代品
边际关联措施。为了通用目的,人们提出了许多方法 [4, 5]
包括非高斯数据的发展[6-9]。特别是对于 scRNA-seq 数据,两个无向
最近提出了基于邻域选择的图形模型,包括 Co-I Cai 的工作 [10, 11]
然而,这并不能推断因果基因调控。为了确定因果关系,有几种替代方法
开发了方法[12, 13]。然而,这些方法要么忽略了 scRNA-seq 的计数性质
数据,需要已知的伪时间(在真实的 scRNA-seq 数据中很少知道),或者理论上不需要
调查横截面观察的因果可识别性。对于差分网络,有多种方法
[14-18] 包括 PI 之前的工作 [19] 已经针对批量 RNA-seq 数据进行了开发,这些数据显示出很好的效果
联合分析相对于独立分析的优势。然而,存在少得多的差分网络
scRNA-seq 数据的工作方法,例如 PT [20] 和 scdNet [21] 。 PT 和 PT 的共同局限性
scdNet 的缺点是他们只考虑边际依赖性(因此容易受到错误发现的影响)并且不考虑
发现因果关系。我们的初步结果(§C.1)表明,提议的贝叶斯
网络模型能够识别横截面 scRNA-seq 中的因果基因调控关系
数据,并且通常优于最先进的替代方法。
目标 2. 构建细胞特异性网络的方法很少,因为很难估计
本质上,样本量为一的网络。最近,开发了一种假设检验方法[22]
估计细胞特定网络。该方法基于每个小区进行近似网络推断
在它的邻居身上。然而,它只考虑对称(无向)边际依赖性,因此
无法推断因果调节关系,并且容易受到虚假关联的影响。 PI之前的工作
[23] 解决了批量 RNA-seq 数据中的“样本大小一”问题,假设因果网络是
附加协变量的平滑函数。然而,如果没有协变量,该方法就不适用
不允许反馈循环,这是 GRN 中的常见主题。现有的工作 [24, 25] 包括 PI 的 [19]
1
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yang Ni其他文献
Yang Ni的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yang Ni', 18)}}的其他基金
Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data
单细胞数据的贝叶斯差分因果网络和聚类方法
- 批准号:
10707494 - 财政年份:2022
- 资助金额:
$ 30.44万 - 项目类别:
相似海外基金
Bayesian Modeling and Inference for High-Dimensional Disease Mapping and Boundary Detection"
用于高维疾病绘图和边界检测的贝叶斯建模和推理”
- 批准号:
10568797 - 财政年份:2023
- 资助金额:
$ 30.44万 - 项目类别:
Bayesian modeling of multivariate mixed longitudinal responses with scale mixtures of multivariate normal distributions
具有多元正态分布尺度混合的多元混合纵向响应的贝叶斯建模
- 批准号:
10730714 - 财政年份:2023
- 资助金额:
$ 30.44万 - 项目类别:
Bayesian Modeling and Scalable Inference for Big Data Streams
大数据流的贝叶斯建模和可扩展推理
- 批准号:
RGPIN-2019-03962 - 财政年份:2022
- 资助金额:
$ 30.44万 - 项目类别:
Discovery Grants Program - Individual
Bayesian modeling on ethical consumption and its empirical application for behavior modification
道德消费的贝叶斯模型及其在行为矫正中的实证应用
- 批准号:
21K18559 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Utilizing Bayesian modeling to improve mutational signature inference in large-scale datasets
利用贝叶斯建模改进大规模数据集中的突变特征推断
- 批准号:
10684720 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Bayesian Modeling and Scalable Inference for Big Data Streams
大数据流的贝叶斯建模和可扩展推理
- 批准号:
RGPIN-2019-03962 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Discovery Grants Program - Individual
Bayesian Modeling of Mass-Spec Proteomics Data to Advance Studies of the Genetic Regulation of Proteins
质谱蛋白质组数据的贝叶斯建模推进蛋白质遗传调控的研究
- 批准号:
10391171 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Utilizing Bayesian modeling to improve mutational signature inference in large-scale datasets
利用贝叶斯建模改进大规模数据集中的突变特征推断
- 批准号:
10490301 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Utilizing Bayesian modeling to improve mutational signature inference in large-scale datasets
利用贝叶斯建模改进大规模数据集中的突变特征推断
- 批准号:
10305242 - 财政年份:2021
- 资助金额:
$ 30.44万 - 项目类别:
Bayesian Modeling and Scalable Inference for Big Data Streams
大数据流的贝叶斯建模和可扩展推理
- 批准号:
RGPIN-2019-03962 - 财政年份:2020
- 资助金额:
$ 30.44万 - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




