Optimized workflows for structural variant analysis of the Kids First genomes using short and long reads
使用短读长和长读长对 Kids First 基因组进行结构变异分析的优化工作流程
基本信息
- 批准号:10432507
- 负责人:
- 金额:$ 15.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-04-01 至 2024-03-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAffectAlgorithmsBase PairingBiological SciencesCodeComplexData AnalysesData SetDevelopmentDiseaseEnsureEthnic OriginEtiologyFamily memberFosteringGenesGeneticGenetic VariationGenomeGenomicsGenotypeGoalsHuman GenomeIndividualJasminumMalignant Childhood NeoplasmMapsMedicalMutationPatientsPediatric ResearchPhasePilot ProjectsPopulationProteinsRepetitive SequenceReproducibilityResearchResearch PersonnelResolutionSamplingStructural Congenital AnomaliesTechnologyTimeVariantX Chromosomeautosomecloud basedcohortdata resourcedriver mutationgenetic analysisgenetic pedigreegenome analysisgenome annotationgenome-widehuman reference genomeimprovedinsertion/deletion mutationnanoporenovelopen sourceparalogous genepower analysisprogramsreconstructionreference genomescreeningsoftware developmentstatistical and machine learningtelomerevariant detection
项目摘要
Project Summary
The overall goal of the Gabriella Miller Kids First Pediatric Research Program is to alleviate suffering from
childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of
these diseases. A recent addition to the program is the Kids First Long Read Pilot Projects, which are leveraging
long-read sequencing technologies to further resolve the patients’ genomes. Already these technologies are
transforming genomics by allowing complete telomere-to-telomere (T2T) reconstructions of human genomes
for the first time, and by allowing the discovery of structural variants and other complex variants that were
previously inaccessible using short read sequencing.
Here we will enhance the utility of the Kids First data sets by developing and applying optimized cloud-scale
workflows for analyzing short and long read datasets with the new T2T-CHM13 human genome. Within the
T2T consortium, we have led the effort to characterize how the CHM13 genome influences variant calling, and
have found the T2T reference universally improves the analysis of genetic variation using both short and long
read sequencing. Here we will develop optimized workflows for analyzing short read datasets with the
T2T-CHM13 reference genome using GATK for SNVs and small indels, and Parliament2 for short-read SV
discovery. Next we will develop optimized workflows for Long Read Structural Variant Detection. Short-reads
are challenged to detect many classes of mutations (e.g. SVs, repeat expansions, etc), and cannot resolve many
repetitive regions of the genome, including within many medically relevant genes. Long-reads show great
promise to address these challenges and discover new disease associations due to its increased mappability,
variant resolution, and phasing capabilities. To enable these technologies for Kids First, we will develop
optimized workflows for accurately identifying and comparing SVs across long read samples with Jasmine, as
well as genotyping SVs discovered by long reads within short read datasets with Paragraph. This will enable us
to analyze and prioritize variants found by long reads within the much larger numbers of short read datasets.
We will then apply these workflows to the Kids First data resource to develop improved variant calls and
improved variant analysis of these precious samples. This will lead to the discovery of thousands of SVs that
were previously missed, and will reduce the number of false variants that would otherwise confuse any
downstream analysis. We will also develop new statistical and machine learning approaches for prioritizing the
variants that are most likely to be related to the studied diseases, leveraging the pedigree information and
genome annotations available, in support of our overall goal of identifying the driver mutations for these
diseases. All workflows and software developments will be released open source for use in CAVATICA, the
cloud-based analysis platform used by all Kids First researchers, ensuring scalability and reproducibility.
项目摘要
加布里埃拉米勒儿童第一儿科研究计划的总体目标是减轻痛苦,
儿童癌症和结构性出生缺陷,促进合作研究,以揭示病因,
这些疾病。最近加入该计划的是儿童第一长阅读试点项目,该项目利用
长读测序技术,以进一步解决病人的基因组。这些技术已经
通过允许人类基因组的完整端粒到端粒(T2 T)重建来改变基因组学
这是第一次,并允许发现结构变异和其他复杂的变异,
以前无法使用短读序测序。
在这里,我们将通过开发和应用优化的云规模,
使用新的T2 T-CHM 13人类基因组分析短读和长读数据集的工作流程。内
T2 T联盟,我们领导了CHM 13基因组如何影响变异识别的研究,
我发现T2 T参考普遍改善了使用短和长两种方法的遗传变异分析
读取测序。在这里,我们将开发优化的工作流程,用于使用
使用GATK的T2 T-CHM 13参考基因组用于SNV和小插入缺失,使用C2 T-CHM 13参考基因组用于短读段SV
的发现接下来,我们将为长读段结构变异检测开发优化的工作流程。短读
检测许多类型的突变(例如SV、重复扩增等)是一项挑战,
基因组的重复区域,包括许多医学相关基因。长阅读显示伟大的
有望解决这些挑战,并发现新的疾病关联,因为它增加了可映射性,
可变分辨率和定相能力。为了使这些技术能够用于儿童优先,我们将开发
优化的工作流程,用于使用Jasmine准确识别和比较长读段样本中的SV,
以及用段落在短读段数据集中通过长读段发现的SV的基因分型。这将使我们
以分析和优先化通过长读段在大量短读段数据集中发现的变体。
然后,我们将这些工作流程应用于Kids First数据资源,以开发改进的变体调用,
改进了这些珍贵样本的变异分析这将导致发现成千上万的SV,
以前错过了,并将减少错误变体的数量,否则会混淆任何
下游分析我们还将开发新的统计和机器学习方法,
最有可能与所研究疾病相关的变异,利用谱系信息,
基因组注释可用,以支持我们的总体目标,确定这些驱动突变
疾病所有工作流程和软件开发都将开源发布,用于CAVATICA,
所有Kids First研究人员都使用基于云的分析平台,确保可扩展性和可重复性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MICHAEL SCHATZ其他文献
MICHAEL SCHATZ的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MICHAEL SCHATZ', 18)}}的其他基金
EXPANDING THE GENOMIC DATA SCIENCE COMMUNITY NETWORK FOR NHGRI.
扩大 NHGRI 的基因组数据科学社区网络。
- 批准号:
10944109 - 财政年份:2023
- 资助金额:
$ 15.63万 - 项目类别:
Optimized workflows for structural variant analysis of the Kids First genomes using short and long reads
使用短读长和长读长对 Kids First 基因组进行结构变异分析的优化工作流程
- 批准号:
10602532 - 财政年份:2022
- 资助金额:
$ 15.63万 - 项目类别:
Integrative genomic and epigenomic analysis of cancer using long read sequencing
使用长读长测序对癌症进行综合基因组和表观基因组分析
- 批准号:
10396074 - 财政年份:2021
- 资助金额:
$ 15.63万 - 项目类别:
Integrative genomic and epigenomic analysis of cancer using long read sequencing
使用长读长测序对癌症进行综合基因组和表观基因组分析
- 批准号:
10599150 - 财政年份:2021
- 资助金额:
$ 15.63万 - 项目类别:
Integrative genomic and epigenomic analysis of cancer using long read sequencing
使用长读长测序对癌症进行综合基因组和表观基因组分析
- 批准号:
10187808 - 财政年份:2021
- 资助金额:
$ 15.63万 - 项目类别:
相似海外基金
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Standard Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Standard Grant
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Training Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Fellowship
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Research Grant
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 15.63万 - 项目类别:
Studentship
ERI: Developing a Trust-supporting Design Framework with Affect for Human-AI Collaboration
ERI:开发一个支持信任的设计框架,影响人类与人工智能的协作
- 批准号:
2301846 - 财政年份:2023
- 资助金额:
$ 15.63万 - 项目类别:
Standard Grant
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 15.63万 - 项目类别:
Operating Grants
How motor impairments due to neurodegenerative diseases affect masticatory movements
神经退行性疾病引起的运动障碍如何影响咀嚼运动
- 批准号:
23K16076 - 财政年份:2023
- 资助金额:
$ 15.63万 - 项目类别:
Grant-in-Aid for Early-Career Scientists