Statistical Methods for Genomic Analysis of Species Divergences
物种差异基因组分析的统计方法
基本信息
- 批准号:BB/K000896/1
- 负责人:
- 金额:$ 42.57万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2013
- 资助国家:英国
- 起止时间:2013 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Our evolutionary history is written in our genomes. By comparing DNA sequences from different species we can work out how the species are related. By comparing the DNA sequences of multiple individuals from the same species, we can estimate the population size and infer demographic changes (such as population bottleneck) of the species. Such studies fall into the domains of phylogenetics and population genetics. Genomic sequence data from multiple individuals of several closely related species allow powerful inference at the interface of phylogenetics and population genetics. One can use such data to estimate species divergence times and ancestral population sizes, accounting for lineage sorting, and to detect gene flow at the time speciation or to test different models of speciation Such data also allow delimitation of species (for example, to decide whether the sampled individuals belong to one or two species).To achieve those goals, powerful statistical methods and computational algorithms are necessary. In this project we will implement such methods within two well-established statistical frameworks: maximum likelihood and Bayesian inference. We will develop maximum likelihood methods for estimating migration rates between populations, and design likelihood ratio tests to test whether there is gene flow at the time of speciation (that is, whether speciation is clean). We will implement models that allow the migration rate to decrease over time since species divergence. Those methods will be useful for testing different speciation models such as allopatric and parapatric speciation. Computational difficulties will limit our likelihood methods to 2 or 3 sequences at each sampled locus. However the methods can accommodate a huge number of loci (indeed the whole genome), and with population data at some loci and species data at other loci, powerful inference is feasible. We will use computer simulations to examine the statistical properties of the new methods, and apply the methods to genomic datasets from the hominoids.We will introduce significant improvements and extensions to a Bayesian model-comparison approach to delimiting species using genomic sequence data. Published a year ago (Yang and Rannala 2010 Proc Natl Acad Sci USA 107:9264-9269), this method has attracted much attention among evolutionary biologists. This uses an algorithm called reversible-jump Markov chain Monto Carlo (rjMCMC) to sample different species-delimitation models, such as the one-species model (which assumes that all sampled individuals are from one single species) and the two species model (which assumes that the sampled individuals are from two distinct species). However, our current implementation in the computer program BPP has serious limitations and is inefficient in intermediate or large datasets. A major objective of this project is to improve the rjMCMC algorithm so that the program becomes feasible for analysis of large genomic-scale datasets. We will also parallelize the programs to improve the computational efficiency.
我们的进化史写在我们的基因组中。通过比较不同物种的DNA序列,我们可以弄清楚这些物种是如何相关的。通过比较来自同一物种的多个个体的DNA序列,我们可以估计该物种的种群规模,并推断该物种的人口统计学变化(如种群瓶颈)。这类研究属于系统发育学和种群遗传学领域。来自几个密切相关物种的多个个体的基因组序列数据允许在系统发育和种群遗传学的界面上进行强大的推断。人们可以使用这些数据来估计物种分化时间和祖先种群的大小,考虑到谱系排序,并在物种形成时检测基因流或测试不同的物种形成模型。这些数据还可以对物种进行划分(例如,确定采样的个体属于一个还是两个物种)。为了实现这些目标,需要强大的统计方法和计算算法。在这个项目中,我们将在两个成熟的统计框架内实现这些方法:最大似然法和贝叶斯推理。我们将开发估计种群间迁移率的最大似然方法,并设计似然比检验,以测试物种形成时是否存在基因流动(即物种形成是否干净)。我们将实施允许迁移率随着时间的推移而下降的模型,因为物种分化。这些方法将有助于测试不同的物种形成模式,如异地物种形成和近地物种形成。计算上的困难将限制我们的似然方法在每个采样轨迹上只有2到3个序列。然而,这些方法可以容纳大量的基因座(实际上是整个基因组),并且利用一些基因座的种群数据和其他基因座的物种数据,强大的推断是可行的。我们将使用计算机模拟来检验新方法的统计特性,并将这些方法应用于来自人类的基因组数据集。我们将对使用基因组序列数据来界定物种的贝叶斯模型比较法进行重大改进和扩展。一年前发表的(Yang和Rannala 2010 Proc Natl Acad Sci USA 107:9264-9269),这种方法引起了进化生物学家的极大关注。它使用一种称为可逆跳跃马尔可夫链蒙特卡罗(RjMCMC)的算法来采样不同的物种划分模型,例如单物种模型(假设所有采样的个体来自一个单一物种)和两个物种模型(假设采样的个体来自两个不同的物种)。然而,我们目前在计算机程序BPP中的实现具有严重的局限性,在中型或大型数据集上效率低下。这个项目的一个主要目标是改进rjMCMC算法,使该程序变得适用于大规模基因组数据集的分析。我们还将对程序进行并行化,以提高计算效率。
项目成果
期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A discrete-beta model for testing gene flow after speciation
- DOI:10.1111/2041-210x.12356
- 发表时间:2015-06-01
- 期刊:
- 影响因子:6.6
- 作者:Liu, Junfeng;Zhang, De-Xing;Yang, Ziheng
- 通讯作者:Yang, Ziheng
A biologist's guide to Bayesian phylogenetic analysis.
- DOI:10.1038/s41559-017-0280-x
- 发表时间:2017-10
- 期刊:
- 影响因子:16.8
- 作者:Nascimento FF;Reis MD;Yang Z
- 通讯作者:Yang Z
The Influence of Gene Flow on Species Tree Estimation: A Simulation Study
- DOI:10.1093/sysbio/syt049
- 发表时间:2014-01-01
- 期刊:
- 影响因子:6.5
- 作者:Leache, Adam D.;Harris, Rebecca B.;Yang, Ziheng
- 通讯作者:Yang, Ziheng
Bayesian species delimitation can be robust to guide-tree inference errors.
- DOI:10.1093/sysbio/syu052
- 发表时间:2014-11
- 期刊:
- 影响因子:6.5
- 作者:Zhang C;Rannala B;Yang Z
- 通讯作者:Yang Z
Unguided species delimitation using DNA sequence data from multiple Loci.
- DOI:10.1093/molbev/msu279
- 发表时间:2014-12
- 期刊:
- 影响因子:10.7
- 作者:Yang Z;Rannala B
- 通讯作者:Rannala B
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ziheng Yang其他文献
A space-time process model for the evolution of DNA sequences.
- DOI:
10.1093/genetics/139.2.993 - 发表时间:
1995-02 - 期刊:
- 影响因子:3.3
- 作者:
Ziheng Yang - 通讯作者:
Ziheng Yang
Evolutionary rate variation among vertebrate beta globin genes: implications for dating gene family duplication events.
脊椎动物β珠蛋白基因之间的进化率变异:对基因家族重复事件测年的影响。
- DOI:
- 发表时间:
2006 - 期刊:
- 影响因子:3.5
- 作者:
Gabriela Aguileta;J. Bielawski;Ziheng Yang - 通讯作者:
Ziheng Yang
A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times
物种分化时间最大似然估计的启发式速率平滑程序
- DOI:
- 发表时间:
2004 - 期刊:
- 影响因子:0
- 作者:
Ziheng Yang - 通讯作者:
Ziheng Yang
Maximum-likelihood models for combined analyses of multiple sequence data
- DOI:
10.1007/bf02352289 - 发表时间:
1996-05 - 期刊:
- 影响因子:3.9
- 作者:
Ziheng Yang - 通讯作者:
Ziheng Yang
Ziheng Yang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ziheng Yang', 18)}}的其他基金
Efficient computational technologies to resolve the Timetree of Life: from ancient DNA to species-rich phylogenies
高效计算技术解析生命时间树:从古代 DNA 到物种丰富的系统发育
- 批准号:
BB/Y004132/1 - 财政年份:2024
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
PAML 5: A friendly and powerful bioinformatics resource for phylogenomics
PAML 5:用于系统基因组学的友好且强大的生物信息学资源
- 批准号:
BB/X018571/1 - 财政年份:2024
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
NSFDEB-NERC: Integrating computational, phenotypic, and population-genomic approaches to reveal processes of cryptic speciation and gene flow in Madag
NSFDEB-NERC:整合计算、表型和群体基因组方法来揭示马达格神秘物种形成和基因流的过程
- 批准号:
NE/X002071/1 - 财政年份:2023
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Bayesian inference of the mode of speciation and gene flow using genomic data
使用基因组数据对物种形成和基因流模式进行贝叶斯推断
- 批准号:
BB/X007553/1 - 财政年份:2023
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Bayesian implementation of the multispecies-coalescent-with-introgression (MSci) model for analysis of population genomic data
用于群体基因组数据分析的多物种合并渗入 (MSci) 模型的贝叶斯实施
- 批准号:
BB/T003502/1 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Efficient Bayesian phylogenomic dating with new models of trait evolution and rich diversities of living and fossil species
利用性状进化的新模型以及活体和化石物种的丰富多样性进行有效的贝叶斯系统发育测定
- 批准号:
BB/T012951/1 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Phylogeographic inference using genomic sequence data under the multispecies coalescent model
多物种合并模型下使用基因组序列数据进行系统发育地理学推断
- 批准号:
BB/P006493/1 - 财政年份:2017
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Improving Bayesian methods for estimating divergence times integrating genomic and trait data
改进贝叶斯方法来估计整合基因组和性状数据的分歧时间
- 批准号:
BB/N000609/1 - 财政年份:2016
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Bayesian Estimation of Species Divergence Times Integrating Fossil and Molecular Information
整合化石和分子信息的物种分化时间的贝叶斯估计
- 批准号:
BB/J009709/1 - 财政年份:2012
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
Representation and Incorporation of Fossil Data in Molecular Dating of Species Divergences
化石数据在物种分歧分子测年中的表示和结合
- 批准号:
BB/G006431/1 - 财政年份:2009
- 资助金额:
$ 42.57万 - 项目类别:
Research Grant
相似国自然基金
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Development of Efficient and Practical Privacy-Preserving Methods for Large-Scale Genomic Statistical Analysis
开发用于大规模基因组统计分析的高效实用的隐私保护方法
- 批准号:
23KJ0649 - 财政年份:2023
- 资助金额:
$ 42.57万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Incorporating geography into statistical methods for analysis of population genomic DNA
将地理学纳入群体基因组 DNA 分析的统计方法
- 批准号:
10737747 - 财政年份:2022
- 资助金额:
$ 42.57万 - 项目类别:
Incorporating geography into statistical methods for analysis of population genomic DNA
将地理学纳入群体基因组 DNA 分析的统计方法
- 批准号:
10615605 - 财政年份:2022
- 资助金额:
$ 42.57万 - 项目类别:
Statistical methods for genomic analysis of heterogeneous tumors
异质肿瘤基因组分析的统计方法
- 批准号:
10662552 - 财政年份:2022
- 资助金额:
$ 42.57万 - 项目类别:
Statistical Methods and Algorithms for Population Genomic Inference
群体基因组推断的统计方法和算法
- 批准号:
9886109 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Incorporating geography into statistical methods for analysis of population genomic DNA
将地理学纳入群体基因组 DNA 分析的统计方法
- 批准号:
10027142 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Statistical Methods and Algorithms for Population Genomic Inference
群体基因组推断的统计方法和算法
- 批准号:
10087945 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Incorporating geography into statistical methods for analysis of population genomic DNA
将地理学纳入群体基因组 DNA 分析的统计方法
- 批准号:
10200099 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Statistical Methods and Algorithms for Population Genomic Inference
群体基因组推断的统计方法和算法
- 批准号:
10333220 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别:
Statistical Methods and Algorithms for Population Genomic Inference
群体基因组推断的统计方法和算法
- 批准号:
10552694 - 财政年份:2020
- 资助金额:
$ 42.57万 - 项目类别: