Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts

开发机器学习模型来分析大型异构队列中的拼接数据

基本信息

  • 批准号:
    10315802
  • 负责人:
  • 金额:
    $ 4.6万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-08-01 至 2024-07-31
  • 项目状态:
    已结题

项目摘要

Abstract Analysis of RNA sequencing (RNASeq) data obtained from large patient cohorts can reveal transcriptomic perturbations that are associated with complex disease and facilitate the identification of disease subtypes. This is typically framed as an unsupervised learning task to discover latent structure in a matrix of RNASeq based quantification of gene expression or local splicing variations (LSVs). However, several factors make analysis of such heterogeneous data challenging. First, such datasets are comprised of samples processed at multiple institutions which might employ different sequencing protocols and quality control steps. This introduces confounding factors into the data like inconsistent sample quality or variable cell type proportions which can hinder detection of true biological signal. Second, in acute myeloid leukemia (AML), mutations in splice factor genes occurring in a subset of the patients may only result in alteration of a subset of coregulated splicing events. Thus, instead of measuring global similarity between samples based on all transcriptomic features, there is a need to efficiently identify “tiles”, defined by a subset of samples and splicing events with abnormal signals. Although several algorithms have been proposed for this task, they fail to overcome many of the computational challenges associated with modeling splicing data and are not well suited to handle missing values. To facilitate analysis of heterogeneous splicing datasets by reducing false positive discoveries and boosting true biological signal, we will first develop a model to correct for the effects of RNA degradation and cell type mixtures. Then in order to efficiently identify AML subtypes characterized by splicing events and account for splicing specific modeling challenges, we propose CHESSBOARD (Characterizing Heterogeneity of Expression and Splicing by Search for Blocks of Abnormalities and Outliers in RNA Datasets), a non- parametric Bayesian model for unsupervised discovery of tiles. We will apply our models to synthetic datasets and show it outperforms several baseline approaches. Next, we will show that it recovers tiles characterized by known and novel splicing aberrations which are reproducible in multiple AML patient cohorts. Finally, we will show that tiles discovered are correlated with drug response to therapeutics, pointing to the translational impact of our findings.
摘要 从大量患者队列中获得的RNA测序(RNAseq)数据的分析可以揭示转录 与复杂疾病相关的扰动,有助于确定疾病亚型。 这通常被认为是发现RNAseq矩阵中潜在结构的无监督学习任务 基于基因表达或局部剪接变异(LSV)的量化。然而,有几个因素使 对这种异质数据的分析具有挑战性。首先,此类数据集由在以下位置处理的样本组成 可能采用不同测序方案和质量控制步骤的多个机构。这 在数据中引入混杂因素,如样本质量不一致或细胞类型比例变化 这可能会阻碍对真实生物信号的检测。第二,在急性髓系白血病(AML)中, 在部分患者中出现的剪接因子基因可能只会导致共调节基因子集的改变 拼接事件。因此,不是基于所有转录本来衡量样本之间的全局相似性 功能,因此需要有效地识别由样本子集定义的“平铺”,并使用 异常信号。尽管已经为这项任务提出了几种算法,但它们未能克服许多 与拼接数据建模相关的计算挑战不太适合处理缺失 价值观。 通过减少误报发现和增强来促进异类剪接数据集的分析 真正的生物信号,我们将首先开发一个模型来修正RNA降解和细胞类型的影响 混合物。然后为了有效地识别以剪接事件为特征的AML亚型并解释 为了拼接特定的建模挑战,我们提出了国际象棋(Characterating Heteristic of 通过在RNA数据集中搜索异常和异常值的块来表达和剪接),非 无监督瓷砖发现的参数贝叶斯模型。我们将把我们的模型应用于合成数据集 并表明它的性能超过了几种基准方法。接下来,我们将展示它可以恢复具有以下特征的切片 已知的和新的剪接异常可在多个AML患者队列中重现。最后,我们会 表明发现的瓷砖与药物对治疗的反应有关,指出翻译 我们发现的影响。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Wang其他文献

David Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Wang', 18)}}的其他基金

Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts
开发机器学习模型来分析大型异构队列中的拼接数据
  • 批准号:
    10506326
  • 财政年份:
    2021
  • 资助金额:
    $ 4.6万
  • 项目类别:
Developing Machine Learning Models for the Analysis of Splicing Data in Large Heterogeneous Cohorts
开发机器学习模型来分析大型异构队列中的拼接数据
  • 批准号:
    10672974
  • 财政年份:
    2021
  • 资助金额:
    $ 4.6万
  • 项目类别:
Neurodifferentiation/Stem Cell Unit
神经分化/干细胞单位
  • 批准号:
    10916077
  • 财政年份:
  • 资助金额:
    $ 4.6万
  • 项目类别:
Neurodifferentiation/Stem Cell Unit
神经分化/干细胞单位
  • 批准号:
    10708659
  • 财政年份:
  • 资助金额:
    $ 4.6万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Fellowship
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Continuing Grant
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Research Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 4.6万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了