Efficient Bayesian phylogenomic dating with new models of trait evolution and rich diversities of living and fossil species
利用性状进化的新模型以及活体和化石物种的丰富多样性进行有效的贝叶斯系统发育测定
基本信息
- 批准号:BB/T01282X/1
- 负责人:
- 金额:$ 25.98万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2020
- 资助国家:英国
- 起止时间:2020 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
As species diverge, they accumulate nucleotide substitutions in their genomes at a rate approximately constant in time. Thus, substitutions serve as timepieces to infer species divergences. By incorporating information from the fossil record, the inferred speciation timings can be calibrated to geological time. This method, known as molecular-clock dating, has broad applications in evolutionary biology, such as studying the timing of spread of viral pandemics, ancient rates of diversification in animals and plants, the relationship of species evolution with past climate or extinction events, human evolution, or the origin of agriculture and animal domestication. Indeed, evolutionary timetrees provide much richer information about species histories than trees without temporal information, thus allowing the formulation and testing of hypotheses on evolutionary timescales.Currently, Bayesian methods are the-state-of-the-art in molecular-clock dating as they allow flexible modelling of evolutionary processes and integration of fossil uncertainties in the analysis. Progresses in Bayesian clock-dating include stochastic models of rate variation among lineages (so-called relaxed clock models), modelling of trait evolution in extant and extinct taxa, and development of "soft-bounds" and flexible fossil calibration densities. While these advances have made the Bayesian method attractive for clock-dating, Bayesian computation relies on MCMC sampling which requires computationally expensive stochastic simulation, precluding the Bayesian method for analysis of large-scale datasets. This is unfortunate since large scale molecular datasets are now commonplace: several high-throughput genome sequencing projects have now been announced or are in progress and we expect a flood of genome-scale data for several thousand species (e.g. the 10K animal genomes and the UK's 66K eukaryotic genomes projects). This deluge of genome data has been accompanied by an explosive increase in the number of morphological datasets based on a computational revolution in comparative anatomy - the widespread deployment of X-Ray Tomography and photogrammetry resulting in vast databases of trait data: MorphoBank and Phenome10K now store over 64,200 surface scans for over 7,000 species. Computational tools capable of exploiting these newly generated datasets are now urgently required. For example, with current methods, inference of a 66K-species timetree would require at least 55 years of computing time (extrapolating from some of our previous analyses). Evidently, the efficiency of analytic methods has not kept apace with the volume of data available and increasingly required to tackle large scale questions in evolutionary biology. In this project we will overcome two major challenges in Bayesian clock dating of species divergences: (i) the mixing and computational limitations of MCMC algorithms in analyses of large datasets, and (ii) the limitations of current trait models of evolution in timetree inference. We will design novel MCMC algorithms to improve the mixing efficiency making use of new ideas about MCMC algorithm design and improve the computational efficiency through code improvement and parallelization. We will incorporate advanced trait models to infer timetrees of extant and fossil species. In particular, we will adapt trait models to analyse large genomic trait datasets such as RNA-seq expression data. The newly developed algorithms will be implemented in our MCMCtree software, and applied to several large-scale empirical datasets with densely sampled extant and fossil species. The data analyses will provide important motivations for method development and serve to showcase our new software by addressing fundamental questions in evolutionary biology. Our proposal addresses the BBSRC's strategic priorities of "data driven biology" and "system approaches to the biosciences".
随着物种的不同,它们以大约恒定时间的速率积累了基因组中的核苷酸取代。因此,替换是推断物种差异的时计。通过合并化石记录中的信息,可以将推断的物种时机校准到地质时间。该方法被称为分子锁定,在进化生物学中具有广泛的应用,例如研究病毒大流传学的传播时间,动物和植物中的古老多样化速率,物种进化与过去气候或灭绝事件的关系,人类进化,人类进化,或农业和动物的起源和动物的起源。实际上,进化时间表提供了与没有时间信息的树木相比,提供有关物种历史的丰富信息,从而允许对进化时间表上的假设进行表述和测试。目前,贝叶斯方法是分子锁定的状态,因为它们允许对质量的进化过程的灵活建模和整合质量分析。贝叶斯时钟约会的进展包括谱系之间的速率变化的随机模型(所谓的放松时钟模型),现存和灭绝分类单元中特质演化的建模以及“软束”的发展以及柔性化石校准密度。尽管这些进步使贝叶斯方法对时钟约会有吸引力,但贝叶斯计算依赖于MCMC采样,该采样需要计算昂贵的随机模拟,从而排除了用于分析大型数据集的贝叶斯方法。这是不幸的,因为大型分子数据集现在很普遍:现在已经宣布了几个高通量基因组测序项目或正在进行中,我们预计几千种的基因组规模数据泛滥(例如,10K动物基因组和英国66K真核病基因组项目)。基因组数据的泛滥伴随着基于比较解剖学中的计算革命的形态数据集数量的爆炸性增加 - X射线层析成像和摄影表的广泛部署,导致特质数据的大量数据库:Morphobank和Phemome10k现在存储在64,200架以上的表面扫描范围内,以超过7,000种。现在迫切需要能够利用这些新生成的数据集的计算工具。例如,使用当前方法,对66k物种时间表的推断将至少需要55年的计算时间(从我们以前的某些分析中推断)。显然,分析方法的效率并没有与可用的数据量保持同步,并且越来越多地解决进化生物学中的大规模问题。在这个项目中,我们将克服物种差异的贝叶斯时钟日期的两个主要挑战:(i)在大型数据集分析中,MCMC算法的混合和计算局限性,以及(ii)时间表推理中当前进化特征模型的局限性。我们将设计新颖的MCMC算法,以提高利用有关MCMC算法设计的新想法的混合效率,并通过改进代码和并行化提高计算效率。我们将结合先进的性状模型来推断现存和化石物种的时间表。特别是,我们将调整性状模型来分析大型基因组性状数据集,例如RNA-Seq表达数据。新开发的算法将在我们的McMctree软件中实现,并应用于具有密集采样现有和化石物种的几个大规模经验数据集。数据分析将为方法开发提供重要的动机,并通过解决进化生物学中的基本问题来展示我们的新软件。我们的建议涉及BBSRC“数据驱动生物学”和“生物科学的系统方法”的战略重点。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Mutation-Selection Model of Protein Evolution under Persistent Positive Selection.
- DOI:10.1093/molbev/msab309
- 发表时间:2022-01-07
- 期刊:
- 影响因子:10.7
- 作者:Tamuri AU;Dos Reis M
- 通讯作者:Dos Reis M
The fossil record of sabre-tooth characins (Teleostei: Characiformes: Cynodontinae), their phylogenetic relationships and palaeobiogeographical implications
剑齿鲨化石记录(Teleostei:Characiformes:Cynodontinae),它们的系统发育关系和古生物地理学意义
- DOI:10.1080/14772019.2022.2070717
- 发表时间:2022
- 期刊:
- 影响因子:2.6
- 作者:Ballen G
- 通讯作者:Ballen G
A species-level timeline of mammal evolution integrating phylogenomic data
- DOI:10.1038/s41586-021-04341-1
- 发表时间:2021-12-22
- 期刊:
- 影响因子:64.8
- 作者:Alvarez-Carretero, Sandra;Tamuri, Asif U.;dos Reis, Mario
- 通讯作者:dos Reis, Mario
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mario Jose Dos Reis Barros其他文献
Mario Jose Dos Reis Barros的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mario Jose Dos Reis Barros', 18)}}的其他基金
Efficient computational technologies to resolve the Timetree of Life: from ancient DNA to species-rich phylogenies
高效计算技术解析生命时间树:从古代 DNA 到物种丰富的系统发育
- 批准号:
BB/Y003624/1 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Research Grant
相似国自然基金
农田生物量遥感估算模型中输入不确定性的贝叶斯优化方法研究
- 批准号:42301386
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
贝叶斯视角下视觉分类器的鲁棒泛化性研究
- 批准号:62302139
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于贝叶斯推断的张量分解模型及其在高维数据中的应用
- 批准号:12301483
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于高维多节点贝叶斯网络的银屑病罹患与五运六气关联性研究
- 批准号:82374618
- 批准年份:2023
- 资助金额:48 万元
- 项目类别:面上项目
面向导弹边界性能估计的离散空间贝叶斯适应性设计
- 批准号:12301325
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
- 批准号:
2341238 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Bayesian Learning with Model Misspecification
模型错误指定的贝叶斯学习
- 批准号:
23K20143 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Rapid, Scalable, and Joint Assessment of Seismic Multi-Hazards and Impacts: From Satellite Images to Causality-Informed Deep Bayesian Networks
地震多重灾害和影响的快速、可扩展和联合评估:从卫星图像到因果关系深度贝叶斯网络
- 批准号:
2242590 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
- 批准号:
2341237 - 财政年份:2024
- 资助金额:
$ 25.98万 - 项目类别:
Continuing Grant