Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies

利用测序数据中的远程单倍型推进大规模遗传学研究

基本信息

  • 批准号:
    10477336
  • 负责人:
  • 金额:
    $ 36.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

The Human Genome Project and subsequent projects such as 1000 Genomes, Genome Sequencing Program (GSP), and Trans-Omics Precision Medicine (TOPMed) are providing powerful resources for studying the genetic basis of human diseases. Combining these resources and technologies with the development of new statistical and computational methods have in the last decade led to identification of thousands of loci associated with disease-related phenotypes, primarily through array-based genome-wide association studies (GWAS), empowered by genotype imputation from sequence-based haplotype panel. However, serious problems remain when analyzing these data: (1) As short read sequencing data only provides unphased genotype data, methods for statistical phasing are used to allow advanced analyses and to generate reference haplotypes for genotype imputation. However, current methods to phase sequence data result in several thousand switch errors per genome. These phasing errors in turn limit the accuracy of genotype imputation and hamper our ability to study haplotype-aware disease models such as compound heterozygotes. (2) Due to the abundance of rare variants, it is necessary to identify high-interest variants to obtain powerful test statistics. Within exons, the genetic code provides some of the necessary information, but for most the genome we have very little information that allows us to prioritize variants. (3) While samples sequenced from diverse and admixed populations are becoming more common, few methods are designed to make use of the unique properties of such data. For example, the distribution of local ancestry in admixed samples generate unique haplotype structure that can be informative about the underlying phasing. Here we propose a set of novel methods that will address these challenges: recognizing that in very large datasets most sequences will have a recent common ancestor with at least one other sequence and that these closely related sequences will share long segments (>1 cM) identical by descent (IBD). These IBD segments provides information about the phasing of the underlying variants similar to large sibships. Moreover, the length of the IBD segment provides information about the age of variants located on the IBD segment. As young variants are more likely to be under selection, IBD length can be used to prioritize functional noncoding variants. We also aim to leverage the long-distance correlation of genotypes in admixed samples to identify phasing errors in admixed samples. As phasing errors also change the local ancestry of a sample in individuals of heterozygous ancestry, identifying these breaks allows identifying and correcting phasing errors. We will develop statistical models that leverage these conceptual ideas and implement these methods in algorithms efficient enough to be applied to sample sizes >100,000. We will use our algorithms to annotate and re-phase existing large sequencing datasets and thus improve commonly used imputation reference panels. All software developed in this proposal will be publicly released in user-friendly, well-documented packages.
人类基因组计划和后续项目,例如 1000 个基因组、基因组测序计划 (GSP) 和 Trans-Omics Precision Medicine (TOPMed) 为研究 人类疾病的遗传基础。将这些资源和技术与新产品的开发相结合 统计和计算方法在过去十年中已经鉴定出数千个基因座 与疾病相关表型相关,主要通过基于芯片的全基因组关联研究 (GWAS),由基于序列的单倍型面板的基因型插补授权。然而,严重 分析这些数据时仍然存在问题:(1)由于短读长测序数据仅提供非定相 基因型数据、统计定相方法用于进行高级分析并生成参考 用于基因型插补的单倍型。然而,当前的相序数据方法会导致一些问题 每个基因组有数千个开关错误。这些定相误差反过来又限制了基因型估算的准确性和 阻碍了我们研究单倍型感知疾病模型(例如复合杂合子)的能力。 (2) 由于 由于存在大量罕见变异,因此有必要识别高兴趣变异以获得强大的测试统计数据。 在外显子内,遗传密码提供了一些必要的信息,但对于大多数基因组来说,我们拥有 很少有信息可以让我们对变体进行优先排序。 (3) 虽然样本来自不同的和 混合群体变得越来越普遍,很少有方法被设计来利用独特的 此类数据的属性。例如,混合样本中当地血统的分布会产生独特的 单倍型结构可以提供有关潜在定相的信息。在这里我们推荐一套小说 解决这些挑战的方法:认识到在非常大的数据集中,大多数序列都会有一个 最近的共同祖先至少有一个其他序列,并且这些密切相关的序列将共享 长节段 (>1 cm) 血统相同 (IBD)。这些 IBD 部分提供了有关 与大型同胞关系类似的基本变体的分阶段。此外,IBD 段的长度提供 有关位于 IBD 段上的变异年龄的信息。由于年轻变种更有可能 在选择下,IBD长度可用于优先考虑功能性非编码变体。我们还旨在利用 混合样品中基因型的长距离相关性,以识别混合样品中的定相误差。作为 定相误差还会改变杂合血统个体中样本的当地血统,从而识别 这些中断可以识别和纠正相位错误。我们将开发统计模型,利用 这些概念思想并在算法中实现这些方法,其效率足以应用于样本 尺寸 >100,000。我们将使用我们的算法对现有的大型测序数据集进行注释和重新定相, 从而改进常用的插补参考面板。本提案中开发的所有软件都将 以用户友好、文档齐全的软件包公开发布。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sebastian Zoellner其他文献

Sebastian Zoellner的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sebastian Zoellner', 18)}}的其他基金

Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10251017
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10653188
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7348103
  • 财政年份:
    2007
  • 资助金额:
    $ 36.2万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7666932
  • 财政年份:
    2007
  • 资助金额:
    $ 36.2万
  • 项目类别:

相似海外基金

Hormone therapy, age of menopause, previous parity, and APOE genotype affect cognition in aging humans.
激素治疗、绝经年龄、既往产次和 APOE 基因型会影响老年人的认知。
  • 批准号:
    495182
  • 财政年份:
    2023
  • 资助金额:
    $ 36.2万
  • 项目类别:
Investigating how alternative splicing processes affect cartilage biology from development to old age
研究选择性剪接过程如何影响从发育到老年的软骨生物学
  • 批准号:
    2601817
  • 财政年份:
    2021
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Studentship
RAPID: Coronavirus Risk Communication: How Age and Communication Format Affect Risk Perception and Behaviors
RAPID:冠状病毒风险沟通:年龄和沟通方式如何影响风险认知和行为
  • 批准号:
    2029039
  • 财政年份:
    2020
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Standard Grant
Neighborhood and Parent Variables Affect Low-Income Preschool Age Child Physical Activity
社区和家长变量影响低收入学龄前儿童的身体活动
  • 批准号:
    9888417
  • 财政年份:
    2019
  • 资助金额:
    $ 36.2万
  • 项目类别:
The affect of Age related hearing loss for cognitive function
年龄相关性听力损失对认知功能的影响
  • 批准号:
    17K11318
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    9320090
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    10166936
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
Affect regulation and Beta Amyloid: Maturational Factors in Aging and Age-Related Pathology
影响调节和 β 淀粉样蛋白:衰老和年龄相关病理学中的成熟因素
  • 批准号:
    9761593
  • 财政年份:
    2017
  • 资助金额:
    $ 36.2万
  • 项目类别:
How age dependent molecular changes in T follicular helper cells affect their function
滤泡辅助 T 细胞的年龄依赖性分子变化如何影响其功能
  • 批准号:
    BB/M50306X/1
  • 财政年份:
    2014
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Training Grant
Inflamm-aging: What do we know about the effect of inflammation on HIV treatment and disease as we age, and how does this affect our search for a Cure?
炎症衰老:随着年龄的增长,我们对炎症对艾滋病毒治疗和疾病的影响了解多少?这对我们寻找治愈方法有何影响?
  • 批准号:
    288272
  • 财政年份:
    2013
  • 资助金额:
    $ 36.2万
  • 项目类别:
    Miscellaneous Programs
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了