权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust, scalable, and accurate discovery of mutational signatures

稳健、可扩展且准确的突变特征发现

基本信息

批准号：
10491360
负责人：
Jonathan Huggins
金额：
$ 19.93万
依托单位：
BOSTON UNIVERSITY (CHARLES RIVER CAMPUS)
依托单位国家：
美国
项目类别：
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-20 至 2024-06-30
项目状态：
已结题

项目摘要

The mutational signatures inferred from tumor genome sequences have the potential to provide a record of environmental exposure and can give clues about the etiology of carcinogenesis. However, for inferred signatures to be biologically meaningful, each signature must accurately represent the contribution of different mutation types in each mutagenic process. Heuristic algorithms using non-negative matrix factorization (NMF) have primarily been used to discover mutational signatures. But these approaches are inflexible, non-robust, and require massive amounts of computation. The objective of the proposed project is to develop computationally efficient algorithms that, despite imperfect modeling assumptions, can discover biologically meaningful signatures. Aim 1 supports this objective by developing a new framework for scalable, easy-to-use, and accurate variational inference – a widely used approach to approximate Bayesian inference – that is applicable to mutational discovery models. Aim 2 develops statistical methods to extract biologically meaningful signatures from the inferences obtained using the proposed variational inference framework. The accuracy and statistical validity of the methods developed in Aims 1 and 2 is ensured through theoretical analysis and numerical experiments on synthetic and real data. Finally, Aim 3 improves upon the current understanding of mutational processes by (1) applying the methods developed in Aims 1 and 2 to a large Pan-Cancer dataset and (2) by developing a novel model that allows for the structured incorporation of single-base and double-base substitutions, and insertions and deletions in each signature. The proposed work is well-positioned to replace heuristics used for discovering meaningful representations of data, and so have long-term impact on how other genomic data types such as single-cell RNA-seq are analyzed. This work is also directly relevant to the NIGMS as it falls under “DNA and RNA metabolisms (repair)” since many mutational processes are related to aberrant DNA repair or “clock-like” molecular mechanisms that are associated with aging, which can be observed in histologically normal appearing tissue

从肿瘤基因组序列中推断出的突变特征有可能提供环境暴露的记录，并可以为癌症发生的病因提供线索。然而，要使推断的签名具有生物学意义，每个签名必须准确地代表不同突变类型在每个诱变过程中的贡献。使用非负矩阵分解(NMF)的启发式算法主要用于发现突变签名。但是这些方法不灵活、不健壮，并且需要大量的计算。该项目的目标是开发计算效率高的算法，尽管有不完善的建模假设，但可以发现具有生物意义的签名。AIM 1通过开发可扩展、易于使用和准确的变分推理的新框架来支持这一目标--这是一种广泛使用的近似贝叶斯推理的方法，适用于突变发现模型。目的2发展统计方法，从使用所提出的变分推理框架获得的推论中提取具有生物意义的特征。通过对合成数据和实际数据的理论分析和数值实验，保证了目标1和目标2中发展的方法的准确性和统计有效性。最后，AIM 3通过以下方式改进了目前对突变过程的理解：(1)将AIMS 1和AIMS 2中开发的方法应用于大型泛癌数据集；(2)开发了一种新的模型，允许结构化地合并单碱基和双碱基替换，以及在每个签名中插入和删除。这项拟议的工作很好地取代了用于发现有意义的数据表示的启发式方法，因此对单细胞RNA-seq等其他基因组数据类型的分析方式具有长期影响。这项工作也与NIGMS直接相关，因为它属于“DNA和RNA代谢(修复)”，因为许多突变过程与异常的DNA修复或与衰老有关的“时钟状”分子机制有关，这可以在组织学上正常的组织中观察到