Scaling up computational genomics with tree sequences

用树序列扩展计算基因组学

基本信息

  • 批准号:
    10471496
  • 负责人:
  • 金额:
    $ 55.68万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-24 至 2023-08-31
  • 项目状态:
    已结题

项目摘要

Project Summary/Abstract Increasing sample size is a tremendously important factor in building our understanding of the genetics of human disease. As we discover that more and more diseases have a complex web of genetic causation, we need larger and larger genetic datasets to disentangle them, and to ultimately produce successful therapies. Driven in part by this need, the community is now assembling vast collections of human genome sequences, and millions of samples will soon be commonplace. Nonhuman datasets, with applications in epidemiology, ecology, and evolution, will not be far behind. There is a profound problem, however: our computational methods for storing, processing, simulating, and analyzing genomic data are lagging far behind our ability to collect such data. The algorithms and data structures underlying today's computational methods were designed for thousands of samples, not millions, and we are in danger of being overwhelmed by the impending tsunami of data. Without a fundamental change in how we store and process genomic data, we will either not fully tap the potential of the data we collect, or the computational costs will be astronomical – or both. Our proposal addresses this critical need by focusing on a new data structure: the succinct tree sequence. This data structure (the “tree sequence”, for brevity) encodes genetic variation data using the population ge- netics processes that produced the data itself – by representing variation among contemporary samples via mutations on the branches of the underlying genealogical trees. This yields extraordinary levels of data com- pression, with file sizes hundreds of times smaller than current community standards. Since the tree sequence was introduced in 2016 it has led to performance increases of 2–4 orders of magnitude in the diverse applica- tions of genome simulation, calculation of statistics, and ancestry inference. Such sudden leaps in computa- tional performance are vanishingly rare, and only possible through deep algorithmic advances. Our research plan builds on the extraordinary successes of tree sequence methods so far, scaling up three crucial layers of computational genomics: analysis, simulation, and inference. First, we will continue our development of highly efficient tree-sequence-based methods for fundamental operations in statistical and population genetics. Second, we will scale up genome simulations by integrating tree sequence methods into complex forward-time simulations, utilizing modern, multicore processors. Third, we will combine efficient genome simulations with cutting-edge deep-learning methods to improve existing inference methods, both of tree sequences from genomic data, and of population parameters from novel tree-sequence encodings of genotype data. Together, we aim to revolutionize the way we work with population genetic variation data, and how we use it to understand human health and evolutionary processes. Our experienced, interdisciplinary team is committed to producing rigorously tested and validated software and accessible, interoperable, and reusable data formats through inclusive and open development.
项目概要/摘要 增加样本量是建立我们对遗传学的理解的一个极其重要的因素 人类疾病。当我们发现越来越多的疾病具有复杂的遗传因果网络时,我们 需要越来越大的基因数据集来解开它们,并最终产生成功的治疗方法。 在这种需求的推动下,社区现在正在收集大量人类基因组序列, 数以百万计的样本很快就会变得司空见惯。非人类数据集,在流行病学中的应用, 生态学和进化论也不会落后太多。然而,存在一个深刻的问题:我们的计算 存储、处理、模拟和分析基因组数据的方法远远落后于我们的能力 收集此类数据。设计了当今计算方法的算法和数据结构 需要数千个样本,而不是数百万个样本,我们面临着被即将到来的海啸淹没的危险 的数据。如果我们存储和处理基因组数据的方式没有根本性的改变,我们要么无法充分利用 我们收集的数据的潜力,或者计算成本将是天文数字——或者两者兼而有之。 我们的提案通过关注一种新的数据结构来解决这一关键需求:简洁的树序列。 该数据结构(为简洁起见,称为“树序列”)使用群体基因编码遗传变异数据 产生数据本身的网络过程——通过表示当代样本之间的变化 底层谱系树分支上的突变。这产生了非凡水平的数据通信 压缩,文件大小比当前社区标准小数百倍。由于树序列 于 2016 年推出,它使各种应用程序的性能提高了 2-4 个数量级 基因组模拟、统计计算和祖先推断。计算能力的突然飞跃 性能表现极其罕见,只有通过深入的算法进步才有可能实现。 我们的研究计划建立在树序列方法迄今为止取得的非凡成功的基础上,扩大了三个 计算基因组学的关键层:分析、模拟和推理。首先,我们将继续我们的 开发基于树序列的高效方法,用于统计和统计中的基本操作 群体遗传学。其次,我们将通过将树序列方法集成到基因组模拟中来扩大基因组模拟规模 利用现代多核处理器进行复杂的前向时间模拟。第三,我们将结合高效 使用尖端深度学习方法进行基因组模拟,以改进现有的推理方法, 来自基因组数据的树序列,以及来自新的树序列编码的群体参数 基因型数据。我们共同致力于彻底改变我们处理群体遗传变异数据的方式,并且 我们如何使用它来了解人类健康和进化过程。 我们经验丰富的跨学科团队致力于生产经过严格测试和验证的软件 通过包容性和开放性的开发,实现可访问、可互操作和可重用的数据格式。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Looking forwards and backwards: Dynamics and genealogies of locally regulated populations
  • DOI:
    10.1214/24-ejp1075
  • 发表时间:
    2024-01-01
  • 期刊:
  • 影响因子:
    1.4
  • 作者:
    Etheridge,Alison M.;Kurtz,Thomas G.;Lung,Terence Tsui Ho
  • 通讯作者:
    Lung,Terence Tsui Ho
tstrait: a quantitative trait simulator for ancestral recombination graphs.
tstrait:祖先重组图的数量性状模拟器。
Estimating evolutionary and demographic parameters via ARG-derived IBD.
通过 ARG 衍生的 IBD 估计进化和人口统计参数。
  • DOI:
    10.1101/2024.03.07.583855
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Huang,Zhendong;Kelleher,Jerome;Chan,Yao-Ban;Balding,DavidJ
  • 通讯作者:
    Balding,DavidJ
link-ancestors: fast simulation of local ancestry with tree sequence software.
  • DOI:
    10.1093/bioadv/vbad163
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Genetic architecture, spatial heterogeneity, and the coevolutionary arms race between newts and snakes.
遗传结构、空间异质性以及蝾螈和蛇之间的共同进化军备竞赛。
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

PETER Lochhead RALPH其他文献

PETER Lochhead RALPH的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('PETER Lochhead RALPH', 18)}}的其他基金

Scaling up computational genomics with tree sequences
用树序列扩展计算基因组学
  • 批准号:
    10585745
  • 财政年份:
    2023
  • 资助金额:
    $ 55.68万
  • 项目类别:
Geographic models of selective sweeps
选择性扫描的地理模型
  • 批准号:
    8370584
  • 财政年份:
    2011
  • 资助金额:
    $ 55.68万
  • 项目类别:
Geographic models of selective sweeps
选择性扫描的地理模型
  • 批准号:
    8198779
  • 财政年份:
    2011
  • 资助金额:
    $ 55.68万
  • 项目类别:

相似海外基金

RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
  • 批准号:
    2327346
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Standard Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
  • 批准号:
    2312555
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Standard Grant
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
  • 批准号:
    BB/Z514391/1
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Training Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
  • 批准号:
    ES/Z502595/1
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Fellowship
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
  • 批准号:
    ES/Z000149/1
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Research Grant
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
  • 批准号:
    23K24936
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
  • 批准号:
    2901648
  • 财政年份:
    2024
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Studentship
ERI: Developing a Trust-supporting Design Framework with Affect for Human-AI Collaboration
ERI:开发一个支持信任的设计框架,影响人类与人工智能的协作
  • 批准号:
    2301846
  • 财政年份:
    2023
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Standard Grant
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
  • 批准号:
    488039
  • 财政年份:
    2023
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Operating Grants
How motor impairments due to neurodegenerative diseases affect masticatory movements
神经退行性疾病引起的运动障碍如何影响咀嚼运动
  • 批准号:
    23K16076
  • 财政年份:
    2023
  • 资助金额:
    $ 55.68万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了