权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Discovery and analysis of structural variation in whole genome sequences

全基因组序列结构变异的发现和分析

基本信息

批准号：
9118280
负责人：
RYAN E MILLS
金额：
$ 38.06万
依托单位：
UNIVERSITY OF MICHIGAN AT ANN ARBOR
依托单位国家：
美国
项目类别：
财政年份：
2013
资助国家：
美国
起止时间：
2013-09-13 至 2018-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9118280
关键词：
Address Algorithms Alleles Area Benign Chromosomal Rearrangement Clinical Communities Complex DNA Sequence Alteration DNA Sequence Rearrangement Data Data Set Databases Detection Diagnostic Disease Event Frequencies Future Genetic Genetic Variation Genome Genomics Genotype Goals Health Human Genome Individual Inherited Karyotype determination procedure Length Machine Learning Medical Medical Genetics Methodology Methods Modeling Nature Organism Pathogenicity Population Publishing Reading Records Reporting Research Research Personnel Resolution Scanning Seeds Source Specificity Statistical Models Structure System Techniques Technology Testing Training Variant Work base clinical Diagnosis clinical application cohort direct application disease phenotype genetic variant genome sequencing genomic variation improved interest markov model rare variant structural genomics tool virtual whole genome

项目摘要

DESCRIPTION (provided by applicant): The whole genome sequencing of large cohorts of individuals is quickly becoming a common tool for researchers to investigate the genetic basis of many disease phenotypes. The primary goals are to discover the underlying genetic variation that cause or contribute to these diseases as well as to correctly identify these variants in a diagnostic setting. These differences typicall consist of single base changes (SNPs), but can also encompass larger, more complex chromosomal rearrangements in the form of structural variation (SV) which are much more difficult to detect even with modern sequencing technologies. A number of approaches have been published that have studied this problem, but even the largest scale endeavors have only focused on deletion events and reported a sensitivity of <70%. Complex chromosomal rearrangements are even less well studied. Thus, it is paramount that accurate methods are developed which can detect all types of SVs at high specificity from sequence data. This proposal aims to improve the overall ability of researchers to identify and analyze genetic variation from whole genome sequences. An important, and often overlooked, aspect of SV discovery is the fact that typical paired-end, read depth, and split read approaches will identify different sets of non-overlapping variants at varying degrees of accuracy. In Aim 1, we will develop a unified SV discovery algorithm that can incorporate all of these different sources of information in a probabilistic fashion. Such a method would be useful for research, in particular with the identification of rare variants, as well as clinical applications which require a great del of accuracy and have thus far been limited to older karyotyping and microarray approaches. This would identify the majority of structural variants, however there are many regions in genomic sequences which are complex in nature, defined as consisting of multiple neighboring or overlapping chromosomal rearrangements that are challenging to resolve with typical SV detection approaches. In Aim 2, we propose methods to resolve these complex regions and assess their frequency and impact. Furthermore, a crucial step in medical genetics is the comparison of identified genetic mutations to databases of known pathogenic and benign variants. This is currently problematic with SVs, as they have often been originally reported with varying degrees of breakpoint resolution that can hamper the correct assignment of the variant. This issue is compounded further in more complex regions with multiple breakpoints, for which simplistic comparison methods do not work well. In Aim 3, we will develop and implement a system that describes and utilizes variant profiles to identify whether an individual's sequence data contains a variant of interest. Overall, this project will advance our understanding of the human genome as well as provide tools for use in the general research and clinical communities.

描述（由申请人提供）：对大量个体的全基因组测序正迅速成为研究人员研究许多疾病表型的遗传基础的常用工具。主要目标是发现导致或促成这些疾病的潜在遗传变异，以及在诊断环境中正确识别这些变异。这些差异通常由单碱基变化（SNP）组成，但也可以包括结构变异（SV）形式的更大、更复杂的染色体重排，即使用现代测序技术也难以检测。已经发表了许多研究这个问题的方法，但即使是最大规模的努力也只集中在缺失事件上，并报告了<70%的灵敏度。复杂的染色体重排研究得更少。因此，至关重要的是，开发可以从序列数据以高特异性检测所有类型的SV的准确方法。该提案旨在提高研究人员从全基因组序列中识别和分析遗传变异的整体能力。SV发现的一个重要但经常被忽视的方面是，典型的配对末端、读段深度和分割读段方法将以不同的准确度鉴定不同组的非重叠变体。在目标1中，我们将开发一个统一的SV发现算法，该算法可以以概率的方式合并所有这些不同的信息源。这样的方法将是有用的研究，特别是与鉴定罕见的变异，以及临床应用，需要一个很大的del的准确性，迄今为止，已被限制到较旧的核型分析和微阵列方法。这将鉴定大多数结构变体，然而，基因组序列中有许多区域本质上是复杂的，被定义为由多个相邻或重叠的染色体重排组成，这些重排难以用典型的SV检测方法解析。在目标2中，我们提出了解决这些复杂区域并评估其频率和影响的方法。此外，医学遗传学的一个关键步骤是将已识别的基因突变与已知致病性和良性变异的数据库进行比较。这目前对于SV是有问题的，因为它们通常最初被报道具有不同程度的断点解析，这可能妨碍变体的正确分配。这个问题在具有多个断点的更复杂的区域中进一步复杂化，对于这些区域，简单的比较方法不能很好地工作。在目标3中，我们将开发和实现一个系统，该系统描述并利用变异概况来识别个体的序列数据是否包含感兴趣的变异。总的来说，该项目将促进我们对人类基因组的理解，并为一般研究和临床社区提供工具。