Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies

利用测序数据中的远程单倍型推进大规模遗传学研究

基本信息

  • 批准号:
    10653188
  • 负责人:
  • 金额:
    $ 36.51万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

The Human Genome Project and subsequent projects such as 1000 Genomes, Genome Sequencing Program (GSP), and Trans-Omics Precision Medicine (TOPMed) are providing powerful resources for studying the genetic basis of human diseases. Combining these resources and technologies with the development of new statistical and computational methods have in the last decade led to identification of thousands of loci associated with disease-related phenotypes, primarily through array-based genome-wide association studies (GWAS), empowered by genotype imputation from sequence-based haplotype panel. However, serious problems remain when analyzing these data: (1) As short read sequencing data only provides unphased genotype data, methods for statistical phasing are used to allow advanced analyses and to generate reference haplotypes for genotype imputation. However, current methods to phase sequence data result in several thousand switch errors per genome. These phasing errors in turn limit the accuracy of genotype imputation and hamper our ability to study haplotype-aware disease models such as compound heterozygotes. (2) Due to the abundance of rare variants, it is necessary to identify high-interest variants to obtain powerful test statistics. Within exons, the genetic code provides some of the necessary information, but for most the genome we have very little information that allows us to prioritize variants. (3) While samples sequenced from diverse and admixed populations are becoming more common, few methods are designed to make use of the unique properties of such data. For example, the distribution of local ancestry in admixed samples generate unique haplotype structure that can be informative about the underlying phasing. Here we propose a set of novel methods that will address these challenges: recognizing that in very large datasets most sequences will have a recent common ancestor with at least one other sequence and that these closely related sequences will share long segments (>1 cM) identical by descent (IBD). These IBD segments provides information about the phasing of the underlying variants similar to large sibships. Moreover, the length of the IBD segment provides information about the age of variants located on the IBD segment. As young variants are more likely to be under selection, IBD length can be used to prioritize functional noncoding variants. We also aim to leverage the long-distance correlation of genotypes in admixed samples to identify phasing errors in admixed samples. As phasing errors also change the local ancestry of a sample in individuals of heterozygous ancestry, identifying these breaks allows identifying and correcting phasing errors. We will develop statistical models that leverage these conceptual ideas and implement these methods in algorithms efficient enough to be applied to sample sizes >100,000. We will use our algorithms to annotate and re-phase existing large sequencing datasets and thus improve commonly used imputation reference panels. All software developed in this proposal will be publicly released in user-friendly, well-documented packages.
人类基因组计划及其后续计划,如1000基因组、基因组测序计划 (GSP)和Trans-Omics Precision Medicine(TOPMed)正在为研究 人类疾病的遗传基础。将这些资源和技术与新技术的开发相结合 在过去的十年里,统计和计算方法导致了数千个基因座的识别 与疾病相关的表型相关,主要通过基于阵列的全基因组关联研究 (Gwas),由基于序列的单倍型小组的基因归属授权。然而,严重的 分析这些数据时仍然存在问题:(1)由于短读测序数据仅提供非阶段性数据 使用基因数据、统计分阶段的方法来进行高级分析并生成参考 用于基因分型的单倍型。然而,当前对相序数据的方法导致几个 每个基因组有数千个开关错误。这些相位误差反过来又限制了基因型推算的准确性和 阻碍了我们研究单倍型感知疾病模型的能力,例如复合杂合子。(2)由于 由于稀有变异的丰富,有必要识别高兴趣的变异,以获得强大的测试统计数据。 在外显子中,遗传密码提供了一些必要的信息,但对大多数基因组来说,我们拥有 只有很少的信息可以让我们确定变体的优先顺序。(3)样品从不同的和不同的 混合种群变得越来越普遍,很少有方法被设计来利用独特的 此类数据的属性。例如,混合样本中当地血统的分布产生了唯一的 单倍型结构,可以提供有关潜在阶段的信息。在这里我们提出了一套小说 解决这些挑战的方法:认识到在非常大的数据集中,大多数序列都将具有 最近的共同祖先与至少一个其他序列,并且这些密切相关的序列将共享 长节(>1厘米)相同的下降(IBD)。这些IBD数据段提供有关 类似于大型兄弟姐妹关系的基础变体的阶段化。此外,IBD片段的长度提供了 有关IBD节段上变异的年龄的信息。因为年轻的变种更有可能是 在选择下,IBD长度可以用来确定功能非编码变体的优先顺序。我们的目标也是利用 混合样本中基因型别的远距离相关以识别混合样本中的相位误差。AS 在杂合血统的个体中,相位误差也会改变样本的本地血统,识别 这些断点允许识别和纠正相位误差。我们将开发统计模型,利用 这些概念性的想法和在算法中实现这些方法的效率足以应用于样本 尺码:10万。我们将使用我们的算法对现有的大型测序数据集进行注释和重新排序 从而完善了常用的归责参照系。本计划书中开发的所有软件都将 以用户友好、文档齐全的包公开发布。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
FICTURE: Scalable segmentation-free analysis of submicron resolution spatial transcriptomics.
图:亚微米分辨率空间转录组学的可扩展无分割分析。
  • DOI:
    10.1101/2023.11.04.565621
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Si,Yichen;Lee,ChangHee;Hwang,Yongha;Yun,JeongH;Cheng,Weiqiu;Cho,Chun-Seok;Quiros,Miguel;Nusrat,Asma;Zhang,Weizhou;Jun,Goo;Zöllner,Sebastian;Lee,JunHee;Kang,HyunMin
  • 通讯作者:
    Kang,HyunMin
SiftCell: A robust framework to detect and isolate cell-containing droplets from single-cell RNA sequence reads.
SiftCell:一个强大的框架,用于从单细胞 RNA 序列读取中检测和分离含有细胞的液滴。
  • DOI:
    10.1016/j.cels.2023.06.002
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    9.3
  • 作者:
    Xi,Jingyue;Park,SungRye;Lee,JunHee;Kang,HyunMin
  • 通讯作者:
    Kang,HyunMin
Seq-Scope Protocol: Repurposing Illumina Sequencing Flow Cells for High-Resolution Spatial Transcriptomics.
Seq-Scope 协议:重新利用 Illumina 测序流动槽实现高分辨率空间转录组学。
  • DOI:
    10.1101/2024.03.29.587285
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kim,Yongsung;Cheng,Weiqiu;Cho,Chun-Seok;Hwang,Yongha;Si,Yichen;Park,Anna;Schrank,Mitchell;Hsu,Jer-En;Xi,Jingyue;Kim,Myungjin;Pedersen,Ellen;Koues,OliviaI;Wilson,Thomas;Jun,Goo;Kang,HyunMin;Lee,JunHee
  • 通讯作者:
    Lee,JunHee
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sebastian Zoellner其他文献

Sebastian Zoellner的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sebastian Zoellner', 18)}}的其他基金

Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10477336
  • 财政年份:
    2020
  • 资助金额:
    $ 36.51万
  • 项目类别:
Leveraging long-range haplotypes in sequencing data to advance large scale genetic studies
利用测序数据中的远程单倍型推进大规模遗传学研究
  • 批准号:
    10251017
  • 财政年份:
    2020
  • 资助金额:
    $ 36.51万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7348103
  • 财政年份:
    2007
  • 资助金额:
    $ 36.51万
  • 项目类别:
Computational Statistic Approaches to Gene-Environment Interaction
基因-环境相互作用的计算统计方法
  • 批准号:
    7666932
  • 财政年份:
    2007
  • 资助金额:
    $ 36.51万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.51万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了