Cross-platform structural variant discovery with deep learning
通过深度学习跨平台结构变体发现
基本信息
- 批准号:10453237
- 负责人:
- 金额:$ 59.39万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-01 至 2027-06-30
- 项目状态:未结题
- 来源:
- 关键词:AlgorithmsAlzheimer&aposs DiseaseArchitectureAutoimmune DiseasesBenchmarkingCardiovascular DiseasesClinicalCommunitiesComplexComputer Vision SystemsComputer softwareConsensusCoupledDataData ReportingData SetDetectionDevelopmentDiagnosisDimensionsDiseaseEngineeringEnsureEvaluationFormulationGenerationsGeneticGenetic DiseasesGenetic VariationGenomeGenotypeGoalsHandHand functionsHi-CHuman GeneticsHuman GenomeHybridsImageLearningLinkMachine LearningMalignant NeoplasmsManualsMedicineMethodologyMethodsMindModelingPatternPerformancePlayPropertyResearchResolutionRoleSamplingScienceSequence AlignmentSignal TransductionSourceStatistical ModelsStructural ModelsStructureTechniquesTechnologyTrainingVariantWorkautism spectrum disorderblindcancer genomeconvolutional neural networkdeep learningdeep learning modeldeep neural networkdesigndiverse dataengineering designexperimental studyflexibilitygenome sequencinggenomic dataheuristicsimprovedmethod developmentnervous system disorderneural networkprecision medicineprototypesequencing platformsimulationtumorvariant detectionwhole genome
项目摘要
Structural variants (SV) are a major driver of the genetic diversity and disease in the human genome and their
discovery is imperative to advances in precision medicine and our understanding of human genetics. Due to
revolutionary breakthroughs in whole-genome sequencing technologies, we now have access to genomic data at an
unprecedented scale and resolution. However, despite tremendous effort and progress in SV calling methodology,
general SV discovery still remains unsolved. Existing techniques use hand-engineered features and heuristics to
model SV classes, relying heavily on developer expertise, which cannot scale to the vast diversity of SV types and
sequencing platforms nor fully harness all the information available in raw sequencing data. As a result, these
methods are usually tightly coupled to the properties of a particular sequencing technology and operate optimally
only on certain SV types and sizes, rendering us blind to many other classes of SVs and their role in disease. Deep
neural networks have the ability to learn complex abstractions automatically from the data and hence offer a
promising avenue for general SV discovery. Deep learning has recently transformed the field of machine learning
and led to remarkable advances in science and medicine. In this proposal we aim to leverage the potential of deep
learning for the problem of SV detection. We lay out how to efficiently formulate SV detection as a deep learning
task, and propose the development of a comprehensive framework to call and genotype SVs of different size and
type, including complex and subclonal SVs, given data from a range of sequencing platforms. In particular, we
demonstrate that state-of-the-art results can be obtained using our approach for short, linked, and long read
datasets. In order to ensure that our models generalize across different datasets, an important goal of our proposal
is also to assemble diverse and representative training data and perform extensive evaluation using publicly-
available multi-platform datasets to accurately assess model performance. Our software will be built with
extensibility and scalability in mind, and will be released, along with pretrained models and callsets, freely to the
community.
结构变异体(SV)是人类基因组中遗传多样性和疾病的主要驱动因素,
发现对于精准医学的进步和我们对人类遗传学的理解至关重要。由于
随着全基因组测序技术的革命性突破,我们现在可以以一种
前所未有的规模和分辨率。然而,尽管在SV呼叫方法上做出了巨大的努力和进步,
一般SV的发现仍然没有解决。现有技术使用手工设计的特征和工艺来
模型SV类,严重依赖于开发人员的专业知识,无法扩展到SV类型的巨大多样性,
测序平台也不能完全利用原始测序数据中可用的所有信息。结果这些
方法通常与特定测序技术的特性紧密结合,
仅在某些SV类型和大小上,使我们对许多其他类型的SV及其在疾病中的作用视而不见。深
神经网络有能力从数据中自动学习复杂的抽象,因此提供了一个
发现一般SV的有希望的途径。深度学习最近改变了机器学习领域
并导致了科学和医学的显著进步。在这项提案中,我们的目标是充分利用
学习SV检测问题。我们展示了如何有效地将SV检测公式化为深度学习
任务,并建议开发一个全面的框架,以调用和基因型不同大小的SV,
类型,包括复杂和亚克隆SV,给定来自一系列测序平台的数据。我们尤其
证明了使用我们的方法可以获得最先进的结果,用于短,链接和长阅读
数据集。为了确保我们的模型在不同的数据集上通用,我们的提案的一个重要目标是
也是收集多样化和有代表性的培训数据,并使用公开的
可用的多平台数据集,以准确评估模型性能。我们的软件将建立与
扩展性和可伸缩性,并将与预训练的模型和调用集一起沿着免费发布给
社区
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Victoria Popic其他文献
Victoria Popic的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Victoria Popic', 18)}}的其他基金
Cross-platform structural variant discovery with deep learning
通过深度学习跨平台结构变体发现
- 批准号:
10686879 - 财政年份:2022
- 资助金额:
$ 59.39万 - 项目类别: