Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies

利用测序数据中的远程单倍型推进大规模遗传学研究

基本信息

  • 批准号:
    10477336
  • 负责人:
  • 金额:
    $ 36.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

The Human Genome Project and subsequent projects such as 1000 Genomes, Genome Sequencing Program (GSP), and Trans-Omics Precision Medicine (TOPMed) are providing powerful resources for studying the genetic basis of human diseases. Combining these resources and technologies with the development of new statistical and computational methods have in the last decade led to identification of thousands of loci associated with disease-related phenotypes, primarily through array-based genome-wide association studies (GWAS), empowered by genotype imputation from sequence-based haplotype panel. However, serious problems remain when analyzing these data: (1) As short read sequencing data only provides unphased genotype data, methods for statistical phasing are used to allow advanced analyses and to generate reference haplotypes for genotype imputation. However, current methods to phase sequence data result in several thousand switch errors per genome. These phasing errors in turn limit the accuracy of genotype imputation and hamper our ability to study haplotype-aware disease models such as compound heterozygotes. (2) Due to the abundance of rare variants, it is necessary to identify high-interest variants to obtain powerful test statistics. Within exons, the genetic code provides some of the necessary information, but for most the genome we have very little information that allows us to prioritize variants. (3) While samples sequenced from diverse and admixed populations are becoming more common, few methods are designed to make use of the unique properties of such data. For example, the distribution of local ancestry in admixed samples generate unique haplotype structure that can be informative about the underlying phasing. Here we propose a set of novel methods that will address these challenges: recognizing that in very large datasets most sequences will have a recent common ancestor with at least one other sequence and that these closely related sequences will share long segments (>1 cM) identical by descent (IBD). These IBD segments provides information about the phasing of the underlying variants similar to large sibships. Moreover, the length of the IBD segment provides information about the age of variants located on the IBD segment. As young variants are more likely to be under selection, IBD length can be used to prioritize functional noncoding variants. We also aim to leverage the long-distance correlation of genotypes in admixed samples to identify phasing errors in admixed samples. As phasing errors also change the local ancestry of a sample in individuals of heterozygous ancestry, identifying these breaks allows identifying and correcting phasing errors. We will develop statistical models that leverage these conceptual ideas and implement these methods in algorithms efficient enough to be applied to sample sizes >100,000. We will use our algorithms to annotate and re-phase existing large sequencing datasets and thus improve commonly used imputation reference panels. All software developed in this proposal will be publicly released in user-friendly, well-documented packages.
人类基因组计划和后续项目,如1000个基因组,基因组测序计划 (GSP)和Trans-Omics Precision Medicine(TOPMed)为研究 人类疾病的遗传基础。将这些资源和技术与新产品的开发相结合 在过去的十年中,统计和计算方法已经导致了数千个基因座的鉴定 与疾病相关表型相关,主要通过基于阵列的全基因组关联研究 (GWAS),由来自基于序列的单倍型组的基因型插补授权。但严重 当分析这些数据时仍然存在问题:(1)由于短读段测序数据仅提供未定相的 基因型数据,统计定相方法用于进行高级分析并产生参考 用于基因型插补的单倍型。然而,相位序列数据的当前方法导致几个 每个基因组上千个开关错误。这些定相误差反过来限制了基因型插补的准确性, 阻碍了我们研究单体型感知疾病模型如复合杂合子的能力。(2)由于 由于存在大量罕见变异,因此有必要识别高兴趣变异以获得强大的检验统计量。 在外显子中,遗传密码提供了一些必要的信息,但对于我们拥有的大多数基因组, 很少的信息能让我们区分变种的优先级。(3)虽然样品测序来自不同的, 混合种群变得越来越普遍,很少有方法被设计来利用独特的 这些数据的属性。例如,混合样本中的本地祖先分布产生独特的 单倍型结构,可以提供关于潜在的定相的信息。在这里,我们提出了一套新颖的 解决这些挑战的方法:认识到在非常大的数据集中,大多数序列将具有 最近的共同祖先与至少一个其他序列,这些密切相关的序列将共享 长片段(> 1cM)通过血统相同(IBD)。这些IBD片段提供了有关 与大型亲缘关系相似的潜在变体的定相。此外,IBD区段的长度提供了 关于位于IBD区段上的变体的年龄的信息。因为年轻的变种更有可能 在选择中,IBD长度可用于优先考虑功能性非编码变体。我们还致力于利用 混合样品中基因型的远距离相关性,以鉴定混合样品中的定相误差。作为 定相错误也改变了杂合祖先个体中样本的本地祖先, 这些中断允许识别和校正定相误差。我们将开发统计模型, 这些概念性的想法,并在算法中实现这些方法,这些算法足够有效,可以应用于样本 尺寸> 100,000。我们将使用我们的算法来注释和重新定相现有的大型测序数据集, 从而改进常用的插补参考面板。本提案中开发的所有软件将 以用户友好、文档齐全的软件包公开发布。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sebastian Zoellner其他文献

Sebastian Zoellner的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sebastian Zoellner', 18)}}的其他基金

Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10251017
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10653188
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7348103
  • 财政年份:
    2007
  • 资助金额:
    $ 36.2万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7666932
  • 财政年份:
    2007
  • 资助金额:
    $ 36.2万
  • 项目类别:

相似海外基金

Hormone therapy, age of menopause, previous parity, and APOE genotype affect cognition in aging humans.
激素治疗、绝经年龄、既往产次和 APOE 基因型会影响老年人的认知。
  • 批准号:
    495182
  • 财政年份:
    2023
  • 资助金额:
    $ 36.2万
  • 项目类别:
Investigating how alternative splicing processes affect cartilage biology from development to old age
研究选择性剪接过程如何影响从发育到老年的软骨生物学
  • 批准号:
    2601817
  • 财政年份:
    2021
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Studentship
RAPID: Coronavirus Risk Communication: How Age and Communication Format Affect Risk Perception and Behaviors
RAPID:冠状病毒风险沟通:年龄和沟通方式如何影响风险认知和行为
  • 批准号:
    2029039
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Standard Grant
Neighborhood and Parent Variables Affect Low-Income Preschool Age Child Physical Activity
社区和家长变量影响低收入学龄前儿童的身体活动
  • 批准号:
    9888417
  • 财政年份:
    2019
  • 资助金额:
    $ 36.2万
  • 项目类别:
The affect of Age related hearing loss for cognitive function
年龄相关性听力损失对认知功能的影响
  • 批准号:
    17K11318
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    9320090
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    10166936
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    9761593
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
How age dependent molecular changes in T follicular helper cells affect their function
滤泡辅助 T 细胞的年龄依赖性分子变化如何影响其功能
  • 批准号:
    BB/M50306X/1
  • 财政年份:
    2014
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Training Grant
Inflamm-aging: What do we know about the effect of inflammation on HIV treatment and disease as we age, and how does this affect our search for a Cure?
炎症衰老:随着年龄的增长,我们对炎症对艾滋病毒治疗和疾病的影响了解多少?这对我们寻找治愈方法有何影响?
  • 批准号:
    288272
  • 财政年份:
    2013
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Miscellaneous Programs
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了