权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Optimized workflows for structural variant analysis of the Kids First genomes using short and long reads

使用短读长和长读长对 Kids First 基因组进行结构变异分析的优化工作流程

基本信息

批准号：
10602532
负责人：
MICHAEL SCHATZ
金额：
$ 15.6万
依托单位：
JOHNS HOPKINS UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-04-01 至 2025-03-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10602532
关键词：
Address Affect Algorithms Base Pairing Biological Sciences Child Code Complex Confusion Data Analyses Data Set Development Disease Ensure Ethnic Origin Etiology Family member Fostering Genes Genetic Genetic Variation Genome Genomics Genotype Goals Human Genome Individual Jasminum Malignant Childhood Neoplasm Medical Mutation Patients Pediatric Research Phase Pilot Projects Population Proteins Repetitive Sequence Reproducibility Research Research Personnel Resolution Sampling Structural Congenital Anomalies Technology Time Variant X Chromosome autosome cloud based cohort data resource driver mutation genetic analysis genetic pedigree genome analysis genome annotation genome-wide human reference genome improved insertion/deletion mutation nanopore novel open source paralogous gene power analysis programs reconstruction reference genome screening software development statistical and machine learning telomere variant detection

项目摘要

Project Summary The overall goal of the Gabriella Miller Kids First Pediatric Research Program is to alleviate suffering from childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of these diseases. A recent addition to the program is the Kids First Long Read Pilot Projects, which are leveraging long-read sequencing technologies to further resolve the patients’ genomes. Already these technologies are transforming genomics by allowing complete telomere-to-telomere (T2T) reconstructions of human genomes for the first time, and by allowing the discovery of structural variants and other complex variants that were previously inaccessible using short read sequencing. Here we will enhance the utility of the Kids First data sets by developing and applying optimized cloud-scale workflows for analyzing short and long read datasets with the new T2T-CHM13 human genome. Within the T2T consortium, we have led the effort to characterize how the CHM13 genome influences variant calling, and have found the T2T reference universally improves the analysis of genetic variation using both short and long read sequencing. Here we will develop optimized workflows for analyzing short read datasets with the T2T-CHM13 reference genome using GATK for SNVs and small indels, and Parliament2 for short-read SV discovery. Next we will develop optimized workflows for Long Read Structural Variant Detection. Short-reads are challenged to detect many classes of mutations (e.g. SVs, repeat expansions, etc), and cannot resolve many repetitive regions of the genome, including within many medically relevant genes. Long-reads show great promise to address these challenges and discover new disease associations due to its increased mappability, variant resolution, and phasing capabilities. To enable these technologies for Kids First, we will develop optimized workflows for accurately identifying and comparing SVs across long read samples with Jasmine, as well as genotyping SVs discovered by long reads within short read datasets with Paragraph. This will enable us to analyze and prioritize variants found by long reads within the much larger numbers of short read datasets. We will then apply these workflows to the Kids First data resource to develop improved variant calls and improved variant analysis of these precious samples. This will lead to the discovery of thousands of SVs that were previously missed, and will reduce the number of false variants that would otherwise confuse any downstream analysis. We will also develop new statistical and machine learning approaches for prioritizing the variants that are most likely to be related to the studied diseases, leveraging the pedigree information and genome annotations available, in support of our overall goal of identifying the driver mutations for these diseases. All workflows and software developments will be released open source for use in CAVATICA, the cloud-based analysis platform used by all Kids First researchers, ensuring scalability and reproducibility.

项目总结