Scaling up computational genomics with tree sequences
用树序列扩展计算基因组学
基本信息
- 批准号:10471496
- 负责人:
- 金额:$ 55.68万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-24 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAffectAlgorithmic SoftwareAlgorithmsArchitectureAreaBase SequenceCollectionCommunitiesComplexComputer softwareComputing MethodologiesDataData CompressionData SetDevelopmentDiseaseEcologyEnsureEpidemiologyEtiologyEvolutionGenealogical TreeGenealogyGenerationsGeneticGenetic ProcessesGenetic RecombinationGenetic VariationGenomeGenomicsGenotypeGoalsHaplotypesHealthHealth BenefitHumanHuman GeneticsHuman GenomeIndividualInternetLibrariesMapsMethodsModelingModernizationMutationPerformancePhasePhenotypePopulationPopulation GeneticsPopulation SizesPositioning AttributeProcessProductionRecording of previous eventsRecordsResearchRunningSample SizeSamplingStatistical Data InterpretationStructureTestingTimeTrainingTreesTsunamiValidationVariantWorkalgorithm developmentbasecomputer frameworkcostdata formatdata reusedata structuredeep learningdesignexperiencefrontiergenome-widegenomic datahuman diseaseimprovedinteroperabilitylearning strategymembermulticore processornext generationnovelnovel strategiesopen sourceoperationscale upsequence learningsimulationstatisticsstructural genomicssuccesssupervised learningwhole genome
项目摘要
Project Summary/Abstract
Increasing sample size is a tremendously important factor in building our understanding of the genetics of
human disease. As we discover that more and more diseases have a complex web of genetic causation, we
need larger and larger genetic datasets to disentangle them, and to ultimately produce successful therapies.
Driven in part by this need, the community is now assembling vast collections of human genome sequences,
and millions of samples will soon be commonplace. Nonhuman datasets, with applications in epidemiology,
ecology, and evolution, will not be far behind. There is a profound problem, however: our computational
methods for storing, processing, simulating, and analyzing genomic data are lagging far behind our ability to
collect such data. The algorithms and data structures underlying today's computational methods were designed
for thousands of samples, not millions, and we are in danger of being overwhelmed by the impending tsunami
of data. Without a fundamental change in how we store and process genomic data, we will either not fully tap
the potential of the data we collect, or the computational costs will be astronomical – or both.
 Our proposal addresses this critical need by focusing on a new data structure: the succinct tree sequence.
This data structure (the “tree sequence”, for brevity) encodes genetic variation data using the population ge-
netics processes that produced the data itself – by representing variation among contemporary samples via
mutations on the branches of the underlying genealogical trees. This yields extraordinary levels of data com-
pression, with file sizes hundreds of times smaller than current community standards. Since the tree sequence
was introduced in 2016 it has led to performance increases of 2–4 orders of magnitude in the diverse applica-
tions of genome simulation, calculation of statistics, and ancestry inference. Such sudden leaps in computa-
tional performance are vanishingly rare, and only possible through deep algorithmic advances.
 Our research plan builds on the extraordinary successes of tree sequence methods so far, scaling up three
crucial layers of computational genomics: analysis, simulation, and inference. First, we will continue our
development of highly efficient tree-sequence-based methods for fundamental operations in statistical and
population genetics. Second, we will scale up genome simulations by integrating tree sequence methods into
complex forward-time simulations, utilizing modern, multicore processors. Third, we will combine efficient
genome simulations with cutting-edge deep-learning methods to improve existing inference methods, both
of tree sequences from genomic data, and of population parameters from novel tree-sequence encodings of
genotype data. Together, we aim to revolutionize the way we work with population genetic variation data, and
how we use it to understand human health and evolutionary processes.
 Our experienced, interdisciplinary team is committed to producing rigorously tested and validated software
and accessible, interoperable, and reusable data formats through inclusive and open development.
项目总结/摘要
增加样本量是建立我们对遗传学的理解的一个非常重要的因素,
人类疾病随着我们发现越来越多的疾病有一个复杂的遗传因果关系网,
需要越来越大的基因数据集来解开它们,并最终产生成功的疗法。
在这一需求的推动下,科学界正在收集大量的人类基因组序列,
数以百万计的样本很快就会变得司空见惯。非人类数据集及其在流行病学中的应用,
生态学和进化论也不会落后太远。然而,有一个深刻的问题:我们的计算
存储、处理、模拟和分析基因组数据的方法远远落后于我们的能力,
收集这些数据。设计了当今计算方法的算法和数据结构
我们需要数千个样本,而不是数百万个样本,我们正处于被即将到来的海啸淹没的危险之中。
的数据.如果我们不从根本上改变存储和处理基因组数据的方式,
我们收集的数据的潜力,或计算成本将是天文数字-或两者兼而有之。
 我们的建议通过关注一种新的数据结构来解决这一关键需求:简洁的树序列。
这种数据结构(简称“树序列”)使用群体遗传学编码遗传变异数据。
产生数据本身的netics过程-通过表示当代样本之间的变化,
潜在系谱树分支上的突变。这将产生非凡的数据COM水平-
压缩,文件大小比当前社区标准小数百倍。由于树序列
在2016年推出,它使各种应用的性能提高了2-4个数量级,
基因组模拟,统计计算和祖先推断的基础。如此突然的计算飞跃-
传统的性能是非常罕见的,只有通过深入的算法进步才有可能。
 我们的研究计划建立在迄今为止树序列方法取得的非凡成功的基础上,
计算基因组学的关键层:分析、模拟和推理。首先,我们将继续
开发基于树序列的高效方法,用于统计和
群体遗传学其次,我们将通过将树序列方法整合到
复杂的前向时间模拟,利用现代多核处理器。第三,我们将联合收割机
基因组模拟与尖端的深度学习方法,以改善现有的推理方法,
从基因组数据的树序列,和人口参数从新的树序列编码,
基因型数据。我们的目标是彻底改变我们处理群体遗传变异数据的方式,
我们如何用它来了解人类健康和进化过程。
 我们经验丰富的跨学科团队致力于生产经过严格测试和验证的软件
以及可访问、可互操作和可重用的数据格式。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Looking forwards and backwards: Dynamics and genealogies of locally regulated populations
- DOI:10.1214/24-ejp1075
- 发表时间:2024-01-01
- 期刊:
- 影响因子:1.4
- 作者:Etheridge,Alison M.;Kurtz,Thomas G.;Lung,Terence Tsui Ho
- 通讯作者:Lung,Terence Tsui Ho
tstrait: a quantitative trait simulator for ancestral recombination graphs.
tstrait:祖先重组图的数量性状模拟器。
- DOI:10.1101/2024.03.13.584790
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Tagami,Daiki;Bisschop,Gertjan;Kelleher,Jerome
- 通讯作者:Kelleher,Jerome
Estimating evolutionary and demographic parameters via ARG-derived IBD.
通过 ARG 衍生的 IBD 估计进化和人口统计参数。
- DOI:10.1101/2024.03.07.583855
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Huang,Zhendong;Kelleher,Jerome;Chan,Yao-Ban;Balding,DavidJ
- 通讯作者:Balding,DavidJ
link-ancestors: fast simulation of local ancestry with tree sequence software.
- DOI:10.1093/bioadv/vbad163
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Genetic architecture, spatial heterogeneity, and the coevolutionary arms race between newts and snakes.
遗传结构、空间异质性以及蝾螈和蛇之间的共同进化军备竞赛。
- DOI:10.1101/2023.12.07.570693
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Caudill,Victoria;Ralph,PeterL
- 通讯作者:Ralph,PeterL
{{
                item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ patent.updateTime }}
PETER Lochhead RALPH其他文献
PETER Lochhead RALPH的其他文献
{{
              item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
{{ truncateString('PETER Lochhead RALPH', 18)}}的其他基金
Scaling up computational genomics with tree sequences
用树序列扩展计算基因组学
- 批准号:10585745 
- 财政年份:2023
- 资助金额:$ 55.68万 
- 项目类别:
相似海外基金
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:BB/Z514391/1 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Training Grant 
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:2312555 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Standard Grant 
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:2327346 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Standard Grant 
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:ES/Z502595/1 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Fellowship 
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:23K24936 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Grant-in-Aid for Scientific Research (B) 
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:ES/Z000149/1 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Research Grant 
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:2901648 
- 财政年份:2024
- 资助金额:$ 55.68万 
- 项目类别:Studentship 
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:488039 
- 财政年份:2023
- 资助金额:$ 55.68万 
- 项目类别:Operating Grants 
New Tendencies of French Film Theory: Representation, Body, Affect
法国电影理论新动向:再现、身体、情感
- 批准号:23K00129 
- 财政年份:2023
- 资助金额:$ 55.68万 
- 项目类别:Grant-in-Aid for Scientific Research (C) 
The Protruding Void: Mystical Affect in Samuel Beckett's Prose
突出的虚空:塞缪尔·贝克特散文中的神秘影响
- 批准号:2883985 
- 财政年份:2023
- 资助金额:$ 55.68万 
- 项目类别:Studentship 

 刷新
              刷新
            
















 {{item.name}}会员
              {{item.name}}会员
            



