Scalable Computational Methods for Genealogical Inference: from species level to single cells

用于谱系推断的可扩展计算方法:从物种水平到单细胞

基本信息

  • 批准号:
    10889303
  • 负责人:
  • 金额:
    $ 31.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-09-01 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

PROJECT SUMMARY Massive amounts of genomic data are currently being generated, providing unprecedented opportunities for biomedical researchers to characterize various biological components and processes. In order to utilize these data to make new biological discoveries and improve human health, accurate models and scalable computational tools need to be developed to facilitate analysis and interpretation. The central objective of this project is to address this challenge by developing more realistic probabilistic models, scalable algorithms, and user-friendly software tools to enable the biomedical research community to better harness large genomic data. Many prob- lems in genomics rely on computational methods for inferring genealogical information from large sequence data and interpreting the reconstructed trees. In this application, we propose to make significant strides towards im- proving this line of research by developing a suite of robust and scalable algorithms for probabilistic models of molecular evolution and genealogical inference across multiple timescales. We will achieve our goal by carrying out the following specific aims: 1) A fundamental problem in statistical analysis of molecular evolution is esti- mating model parameters, for which maximum likelihood estimation (MLE) is typically employed. Unfortunately, MLE is a computationally expensive task, in some cases prohibitively so. In Aim 1, we will utilize a novel MLE framework and modern optimization methods to develop a broadly applicable computational method that achieves several orders of magnitude speedup in MLE while maintaining high statistical efficiency for general models of molecular evolution. We will apply our tools to improve phylogenetic inference for two clin- ically important superfamilies of membrane proteins in humans, namely G protein-coupled receptors (GPCRs) and Solute carrier (SLC) transporters. 2) Because of meiotic recombination, the genetic variability within humans cannot be represented by a single tree. Instead, there are millions of different trees across the genome, where each position in the genome will tend to have its own tree that only differs minimally from the trees in nearby sites. The collection of all these trees, and the set of recombination points creating new trees, is represented by the Ancestral Recombination Graph (ARG), which has a number of applications in human genetics. Despite substantial recent progress on reconstructing ARGs, however, current methods are either too slow to scale up to large data sets, or they do not sample ARGs accurately from a well-calibrated posterior distribution. In Aim 2, will develop a new scalable computational method to improve ARG reconstruction and sampling. We will test the method extensively on simulated data, develop a number of applications, and apply it on a number of different human data sets to illustrate its utility. 3) Applications of genealogical inference methods have been rapidly growing in single-cell genomics. In particular, advances in CRISPR/Cas9 genome editing technologies have enabled lineage tracing for thousands of cells in vivo, and the problem of reconstructing trees from such data has received considerable attention recently. In Aim 3, we will develop scalable algorithms to reconstruct time-resolved single-cell trees for thousands of cells sampled at multiple time points. We will also develop a novel statistical method grounded in rigorous theory to improve fitness estimation from trees. We will apply the methods developed here to analyze single-cell lineage-tracing data from an iterative metastasis experiment to study cancer evolution, as well as B cell affinity maturation data from a highly innovative experimental design to study germinal center evolution.
项目摘要 目前正在产生大量的基因组数据,为人类提供了前所未有的机会。 生物医学研究人员来表征各种生物成分和过程。为了利用这些 用于做出新生物学发现和改善人类健康的数据、准确的模型和可扩展的计算 需要开发工具,以便利分析和解释。该项目的主要目标是 通过开发更现实的概率模型,可扩展的算法和用户友好的 软件工具,使生物医学研究界能够更好地利用大型基因组数据。许多问题- 基因组学中的LEMS依赖于从大序列数据推断谱系信息的计算方法 并解释重建的树木。在本申请中,我们建议向IM迈出重大步伐- 通过开发一套强大的和可扩展的概率模型算法来证明这一研究路线, 跨多个时间尺度的分子进化和谱系推断。我们将通过携带 提出了以下具体目标:1)分子进化统计分析中的一个基本问题是估计, 配对模型参数,通常采用最大似然估计(MLE)。不幸的是, MLE是一个计算上昂贵的任务,在某些情况下是如此。在目标1中,我们将利用一种新的MLE 框架和现代优化方法来开发一种广泛适用的计算方法, 在MLE中实现了几个数量级的加速,同时保持了较高的统计效率, 分子进化的一般模型。我们将应用我们的工具来改善两个临床的系统发育推断- 在人类中具有重要意义的膜蛋白超家族,即G蛋白偶联受体(GPCR) 和溶质载体(SLC)转运蛋白。2)由于减数分裂重组,人类的遗传变异性 不能用一棵树来表示。相反,基因组中有数百万种不同的树, 基因组中的每个位置将倾向于具有其自己的树, 网站.所有这些树的集合,以及创建新树的重组点的集合,被表示为 祖先进化图(ARG),它在人类遗传学中有许多应用。尽管 然而,最近在重建ARG方面取得了重大进展,目前的方法要么太慢,无法扩大规模, 大型数据集,或者他们没有从校准良好的后验分布中准确地采样ARG。在目标2中, 将开发一种新的可扩展的计算方法,以改善ARG重建和采样。我们 我将在模拟数据上广泛测试该方法,开发一些应用程序,并将其应用于一些 不同的人类数据集来说明它的效用。3)系谱推理方法的应用已经 在单细胞基因组学中迅速发展。特别是CRISPR/Cas9基因组编辑技术的进展 已经能够在体内对数千个细胞进行谱系追踪,以及从这些数据重建树的问题 最近受到了相当大的关注。在目标3中,我们将开发可扩展的算法来重建 在多个时间点采样的数千个细胞的时间分辨单细胞树。我们还将开发 一种基于严格理论的新的统计方法,用于改进树的适应度估计。我们将应用 本文开发的方法用于分析来自迭代转移实验的单细胞谱系追踪数据, 研究癌症演变,以及B细胞亲和力成熟数据, 研究大脑中枢的进化。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Exact and efficient phylodynamic simulation from arbitrarily large populations.
来自任意大群体的精确且高效的系统动力学模拟。
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Celentano,Michael;DeWitt,WilliamS;Prillo,Sebastian;Song,YunS
  • 通讯作者:
    Song,YunS
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ian H Holmes其他文献

Ian H Holmes的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ian H Holmes', 18)}}的其他基金

Web-based visualization of coronavirus genomes and proteins
基于网络的冠状病毒基因组和蛋白质可视化
  • 批准号:
    10162044
  • 财政年份:
    2020
  • 资助金额:
    $ 31.5万
  • 项目类别:
Developing the JBrowse Genome Browser to Visualize Structural Variants and Cancer Genomics Data
开发 JBrowse 基因组浏览器以可视化结构变异和癌症基因组数据
  • 批准号:
    9751259
  • 财政年份:
    2017
  • 资助金额:
    $ 31.5万
  • 项目类别:
Developing the JBrowse Genome Browser to Visualize Structural Variants and Cancer Genomics Data
开发 JBrowse 基因组浏览器以可视化结构变异和癌症基因组数据
  • 批准号:
    9390007
  • 财政年份:
    2017
  • 资助金额:
    $ 31.5万
  • 项目类别:
Developing the JBrowse Genome Browser to Visualize Structural Variants and Cancer Genomics Data
开发 JBrowse 基因组浏览器以可视化结构变异和癌症基因组数据
  • 批准号:
    9524813
  • 财政年份:
    2017
  • 资助金额:
    $ 31.5万
  • 项目类别:
Enhancing the GMOD Suite of Genome Annotation and Visualization Tools
增强 GMOD 基因组注释和可视化工具套件
  • 批准号:
    8108959
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:
Enhancement of the GBrowse Genome Annotation Browser
GBrowse 基因组注释浏览器的增强
  • 批准号:
    7487905
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:
Developing the Apollo software for high-throughput annotation of multiple genomes
开发用于多个基因组高通量注释的 Apollo 软件
  • 批准号:
    10736567
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:
Apollo - Universal Infrastructure for Genome Curation
Apollo - 基因组管理的通用基础设施
  • 批准号:
    10176512
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:
Enhancements to the GMOD Suite of Genome Annotation and Visualization Tools
基因组注释和可视化工具 GMOD 套件的增强
  • 批准号:
    9920732
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:
Enhancement of the GBrowse Genome Annotation Browser
GBrowse 基因组注释浏览器的增强
  • 批准号:
    8151702
  • 财政年份:
    2007
  • 资助金额:
    $ 31.5万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 31.5万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了