权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Methods for Evolutionary Genomics Analysis

进化基因组学分析方法

基本信息

批准号：
10322021
负责人：
Sudhir Kumar
金额：
$ 49.53万
依托单位：
TEMPLE UNIV OF THE COMMONWEALTH
依托单位国家：
美国
项目类别：
财政年份：
2021
资助国家：
美国
起止时间：
2021-02-01 至 2026-01-31
项目状态：
未结题

项目摘要

Summary/Abstract Continuing advances in nucleotide sequencing have resulted in the assembly of datasets containing large numbers of species, genes, and genomic segments. Phylogenomic analyses of these data are essential to progress in understanding evolutionary patterns across the tree of life, and are finding increasing numbers of applications in practical analyses that require understanding of how patterns change over time. The sheer size of phylogenomic datasets limits the practical utility of available methods due to excessive time and memory requirements. We have developed many high impact methods and tools for comparative analysis of molecular sequences, a tradition we propose to continue through this MIRA project by developing innovative methods that address new challenges in phylogenomics. We will focus on pattern-based approaches of machine learning with sparsity constraint (SL) applied to phylogenomics, as a complement to traditional model-based methods in molecular evolution and phylogenetics. In the proposed SL in Phylogenomics (SLiP) framework, we will build models that best explain the biological trait or evolutionary hypothesis of interest, with genomic loci, such as genes, proteins, and genomic segments, serving as model parameters. Preliminary results from two example applications establish the premise and promise of a general SLiP framework. In one, SLiP successfully detected loci whose inclusion in a phylogenomic dataset overtakes a consistent and contrasting signal from hundreds of other loci when inferring phylogenetic relationships. In the other example, SLiP revealed loci and biological functional categories that harbor convergent sequence evolutionary patterns associated with the emergence of the same trait in distinct evolutionary lineages. In all of these analyses, SLiP required only a small fraction of the computational time and memory demanded by traditional methods, and it enabled better evolutionary contrasts with fewer assumptions. Consequently, the successful development of SLiP will improve the feasibility, rigor, and reproducibility of large-scale data analysis. It will also democratize big data analytics via shortened analysis time and a relatively small memory footprint, and encourage the development of a new class of methods for phylogenomic analysis. This framework will be accessed from a free library of SLiP functions, which will be directly useable via command line and available in a graphical interface through integration with the MEGA software.

摘要/摘要核苷酸测序方面的不断进步导致了包含大量物种、基因和基因组片段的数量。对这些数据的系统发育分析对于在理解生命之树上的进化模式方面取得了进展，并发现越来越多的在实际分析中的应用需要了解模式如何随时间变化。纯粹的大小由于过多的时间和内存，系统基因组数据集限制了可用方法的实用要求。我们已经开发了许多用于分子比较分析的高影响力方法和工具序列，我们建议通过开发创新的方法来延续这个Mira项目的传统应对系统基因组学中的新挑战。我们将重点介绍基于模式的机器学习方法稀疏性约束(SL)应用于系统发育学，作为对传统的基于模型的方法的补充分子进化和系统发育。在建议的系统基因组学SL(SLIP)框架中，我们将建立最好地解释感兴趣的生物学特征或进化假说的模型，例如基因、蛋白质和基因组片段，作为模型参数。两个实例的初步结果应用程序建立了通用SLIP框架的前提和承诺。在一次中，成功检测到滑移其包含在系统基因组数据集中的基因座超过了来自数百个在推断系统发育关系时的其他基因座。在另一个例子中，SLIP揭示了基因座和生物学特征具有与出现相关的收敛序列进化模式的功能类别在不同的进化谱系中具有相同的特征。在所有这些分析中，SLIP只需要一小部分传统方法所需的计算时间和内存，并实现了更好的进化对比用更少的假设。因此，SLIP的成功开发将提高SLIP的可行性、严密性、以及大规模数据分析的可重复性。它还将通过缩短分析时间使大数据分析大众化时间和相对较小的内存占用，并鼓励开发一类新的方法来系统发生学分析。这个框架将从一个免费的SLIP函数库中访问，该库将是可通过命令行直接使用，并可通过与Mega集成的图形界面使用软件。