Direct whole-genome haplotype-resolved assembly using sequence graphs

使用序列图直接进行全基因组单倍型解析组装

基本信息

  • 批准号:
    10015321
  • 负责人:
  • 金额:
    $ 4.8万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-09-10 至 2021-01-03
  • 项目状态:
    已结题

项目摘要

Abstract The lack of complete, high-quality sequencing of human genomes is a major bottleneck for accurate and complete analyses in population and medical genetics. Advances in a variety of sequencing technologies have created enormous opportunities to yield full assemblies of every chromosome and its homologue (called as haplotypes). The reconstruction of haplotype sequences from sequencing data is known as diploid assembly or haplotype-aware de novo assembly. Standard de novo assemblers are limited in their ability to combine mixed data types, and also collapse haplotype sequences, resulting in expensive, discontinuous, and inaccurate assemblies. Our interim goal is a finished human genome that would not only reveal the last remaining regions of the genome, but also benefit downstream analyses by providing an unbiased reference for comparison and mapping, as well as the complete phased sequencing of several human and non-human genomes for specific research projects. This project will develop a novel computational toolkit WHdenovo, that can optimally combine various sequencing data types to generate phased assemblies of single individuals and pedigrees. In aim 1 (K99 phase), I will provide computationally efficient tools that are easy-to-use, open-source and are production level for generating diploid assemblies of pedigrees at minimal cost. In aim 2 (R00 phase), I will develop novel computational tools for generating pedigree-independent diploid assemblies of single individuals over whole genomes including centromeres. In aim 3 (R00 phase), the tools developed during aims 1 and 2 will be applied to generating diploid assemblies of diverse human and non-human genomes, and of clinically relevant regions such as the histocompatibility complex (MHC) and killer cell immunoglobulin-like receptor (KIR) region. My goal is to design tools that will be useful to large consortiums such as Genome in a Bottle, High Quality Human Reference Genomes, and the Personal Genome Project. My extensive background in computational biology puts me in a unique position to accomplish this proposal, which requires a seamless integration between data science and genomics. Career and Training: I received my PhD in Computer Science at Max Planck Institute for Informatics, and started postdoctoral research in the lab of Professor George Church at Harvard Medical School. During the K99 phase, I will continue to be mentored by Professor Church. Under the supervision of co-mentor Heng Li, I will advance my expertise in making computational tools efficient in practice, and how to tune them for upcoming novel high throughput sequencing (HTS) datasets. This proposed plan would prepare me to be an independent bioinformatics research scientist. ​ ​ ​ ​​ ​
摘要 缺乏完整的、高质量的人类基因组测序是准确和 完整的人口和医学遗传学分析。各种测序技术的进步已经 创造了巨大的机会来生产每条染色体及其同源物的完整组装(称为 单倍型)。根据测序数据重建单倍型序列被称为二倍体组装或 识别单倍型的从头组装。标准的新手汇编器在组合混合的能力方面是有限的 数据类型,还会折叠单倍型序列,从而导致昂贵、不连续和不准确 装配。我们的临时目标是得到一个完整的人类基因组,它不仅能揭示最后剩下的区域 基因组,但也有利于下游分析,提供一个公正的参考比较和 测绘,以及对特定的几个人类和非人类基因组进行完整的阶段性测序 研究项目。 该项目将开发一个新的计算工具包WHdenovo,它可以最佳地结合各种 对数据类型进行排序,以生成单个个体和系谱的阶段性集合。目标1(K99 阶段),我将提供计算效率高的工具,这些工具是易于使用的、开源的,并且是生产级的 用于以最低成本产生家系的二倍体组件。在目标2(R00阶段),我将开发新的 用于生成与系谱无关的单个个体整体二倍体组件的计算工具 包括着丝粒在内的基因组。在目标3(R00阶段)中,将应用在目标1和目标2期间开发的工具 以产生不同人类和非人类基因组以及临床相关区域的二倍体组件 如组织相容性复合体(MHC)和杀伤细胞免疫球蛋白样受体(KIR)区域。我的目标 是设计对大型财团有用的工具,如瓶装基因组、高质量人类 参考基因组和个人基因组计划。 我在计算生物学方面的丰富背景使我处于一个独特的位置来完成这个提议, 这需要数据科学和基因组学之间的无缝结合。职业和培训:我收到了 我在马克斯·普朗克信息学研究所获得了计算机科学博士学位,并在 哈佛医学院乔治·丘奇教授的实验室。在K99阶段,我将继续成为 由丘奇教授指导。在联合导师恒力的指导下,我将在以下方面提高我的专业知识 使计算工具在实践中高效,以及如何调整它们以适应即将到来的新的高吞吐量 排序(HTS)数据集。这项拟议的计划将使我做好准备,成为一名独立的生物信息学家 研究科学家。 ​ ​ ​ ​​ ​

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Shilpa Garg其他文献

Shilpa Garg的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Predoctoral Training in Bioinformatics and Computational Biology
生物信息学和计算生物学博士前培训
  • 批准号:
    10715126
  • 财政年份:
    2023
  • 资助金额:
    $ 4.8万
  • 项目类别:
UCLA Pediatric Research Education Program in Bioinformatics, Computational Biology, and Omics
加州大学洛杉矶分校生物信息学、计算生物学和组学儿科研究教育项目
  • 批准号:
    10629061
  • 财政年份:
    2023
  • 资助金额:
    $ 4.8万
  • 项目类别:
Using Computational Intelligence for Bioinformatics and Computational Biology
将计算智能用于生物信息学和计算生物学
  • 批准号:
    575765-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 4.8万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
Graduate Training Program in Computational Biology, Bioinformatics and Biomedical Data Science (CBB)
计算生物学、生物信息学和生物医学数据科学研究生培训项目(CBB)
  • 批准号:
    10654859
  • 财政年份:
    2022
  • 资助金额:
    $ 4.8万
  • 项目类别:
Conference: Conference on Bioinformatics, Computational Biology, and Health Informatics 2022
会议:2022 年生物信息学、计算生物学和健康信息学会议
  • 批准号:
    2233805
  • 财政年份:
    2022
  • 资助金额:
    $ 4.8万
  • 项目类别:
    Standard Grant
Systems Biology of Early Atopy (SUNBEAM) Analysis and Bioinformatics Center
早期特应性系统生物学(SUNBEAM)分析和生物信息学中心
  • 批准号:
    10573523
  • 财政年份:
    2022
  • 资助金额:
    $ 4.8万
  • 项目类别:
Systems Biology, Bioinformatics, & Data Integration
系统生物学、生物信息学、
  • 批准号:
    10459538
  • 财政年份:
    2021
  • 资助金额:
    $ 4.8万
  • 项目类别:
Systems Biology, Bioinformatics, & Data Integration
系统生物学、生物信息学、
  • 批准号:
    10653908
  • 财政年份:
    2021
  • 资助金额:
    $ 4.8万
  • 项目类别:
MCA: Application of Quantum Computing in Bioinformatics and Computational Biology
MCA:量子计算在生物信息学和计算生物学中的应用
  • 批准号:
    2120949
  • 财政年份:
    2021
  • 资助金额:
    $ 4.8万
  • 项目类别:
    Standard Grant
Predoctoral Training Program in Bioinformatics and Computational Biology
生物信息学和计算生物学博士前培训项目
  • 批准号:
    10641034
  • 财政年份:
    2021
  • 资助金额:
    $ 4.8万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了