权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistical, population genetics and genetic epidemiology

统计、群体遗传学和遗传流行病学

基本信息

批准号：
8929785
负责人：
dmitri v zaykin
金额：
$ 27.92万
依托单位：
NATIONAL INSTITUTE OF ENVIRONMENTAL HEALTH SCIENCES
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/8929785
关键词：
Accounting Analysis of Variance Cancer cell line Complex Data Data Set Data Sources Development Disease Disease model Disease susceptibility Drowning Environmental Exposure Environmental Risk Factor Family Frequencies Genes Genetic Genetic Determinism Genetic Polymorphism Genetic Variation Genome Genomics Human Human Genetics Linkage Disequilibrium Maps Medicine Methodology Methods Mutagenesis Mutation Onset of illness Pattern Performance Persons Play Population Population Genetics Positioning Attribute Probability Public Health Publishing Reaction Research Role Shapes Signal Transduction Statistical Methods Testing Time Variant base cancer genome design disorder prevention flexibility genetic association genetic epidemiology genetic variant genome sequencing genome wide association study health related quality of life human disease improved interest mathematical methods next generation next generation sequencing novel rare variant response trait

项目摘要

The genetic makeup of a person can be thought of as shaping their propensity to complex diseases, while environmental factors trigger onset of diseases and, together with genetic factors, can modify their progression. Research of my group reflects our continuing involvement in the design and analysis of large-scale genetic and genomic studies. We continue to devise methodology that is useful not only for genetic applications but also generally applicable for analysis of other kinds of multidimensional data where many statistical hypotheses are being evaluated simultaneously. 1. Strategies for design of large-scale studies and methods for analysis of top-ranking signals in high-dimensional data. Studies with large numbers of genetic association tests such as genome-wide association and sequencing studies are commonly aimed at human diseases which are already known to have heritable components. Therefore, these studies do contain truly associated signals with high likelihood, implying that testing a family-wise null hypothesis is not of much interest. The question is not "if" the data contain genuine signals, but "where" these signals are located among the multitude of tested variants. A particular signal can empirically rank first, second and so on, and these possible ranks will have different probabilities. These ranking probabilities can be evaluated and used in useful ways for prediction of the number of true discoveries expected in a study We continue statistical research pursuing efficient ways to estimate and utilize probabilistic rankings of true signals. 2. Using genomics to better separate the wheat from the chaff. This research is aimed to develop approaches that exploit expanding genomic data sources, including next generation sequencing data. We have begun to design statistical approaches for analyzing somatic mutagenesis in cancer cell line populations utilizing allelic fractions of different types of mutations and patterns of mutagenesis. This project is starting to produce preliminary results that enable us to pinpoint mutations that with high chance have occurred simultaneously based on similarity of allelic fractions, mutation signatures and co-localization within the cancer genome. 3. Statistical methods for detecting genetic associations with disease for next-generation genomic data and rare variants. Our continuing research related to the design of methods for mapping genetic determinants of disease is being extended to accommodate whole genome sequencing data. One advantage of the sequencing approach is that new rare and low frequency variants can be assessed, including those that are chiefly carried by subjects with the condition under study. Statistical approaches for association of rare variants are being rapidly developed, despite a major statistical challenge: low power of association tests at each particular rare variant. Improved statistical methods are needed for pooling information across both rare and common variants within genetic regions. We are developing methods based on the functional analysis of variance framework. In contrast with the traditional analysis of variance (ANOVA), where fixed group means are compared, functional analysis of variance (FANOVA) compares varying functions where, for example, a groups function may depend on time. In the genetic context, the genomic position of a variant within a gene plays the role of time, i.e., serving as the argument of the function. The function-valued approaches have a number of attractive features, including smoothing capabilities and inherent ability to account for linkage disequilibrium among genetic variants. FANOVA appears to be a powerful approach. Specifically, we compared performance of our extension of the FANOVA approach with other published approaches. We found that FANOVA has considerably greater power than that of competing methods for studied disease models, including those where most of the rare variants are deleterious as well as those with a mix of protective and deleterious variants. We will be further improving the functional approach to enhance its flexibility and power.

一个人的基因组成可以被认为是塑造他们对复杂疾病的倾向，而环境因素触发疾病的发作，并与遗传因素一起改变疾病的进展。我的团队的研究反映了我们持续参与大规模遗传和基因组研究的设计和分析。我们继续设计的方法，不仅是有用的遗传应用程序，但也普遍适用于其他类型的多维数据的分析，许多统计假设正在同时进行评估。 1.大规模研究的设计策略和高维数据中顶级信号的分析方法。进行大量基因关联测试的研究，例如全基因组关联和测序研究，通常针对已知具有遗传成分的人类疾病。因此，这些研究确实包含具有高可能性的真正相关信号，这意味着检验家族零假设并不重要。问题不在于数据“是否”包含真正的信号，而在于这些信号在众多测试变体中的“位置”。一个特定的信号可以凭经验排名第一，第二等，这些可能的排名将有不同的概率。这些排名概率可以被评估并以有用的方式用于预测研究中预期的真实发现的数量。我们继续进行统计研究，寻求有效的方法来估计和利用真实信号的概率排名。 2.利用基因组学来更好地区分小麦和谷壳。这项研究旨在开发利用不断扩大的基因组数据源的方法，包括下一代测序数据。我们已经开始设计统计方法，利用不同类型突变和突变模式的等位基因部分分析癌细胞系群体中的体细胞突变。该项目开始产生初步结果，使我们能够根据等位基因分数的相似性，突变特征和癌症基因组内的共定位来确定同时发生的高概率突变。 3.用于检测下一代基因组数据和罕见变异与疾病遗传关联的统计方法。我们正在进行的与绘制疾病遗传决定因素的方法设计有关的研究正在扩大，以适应全基因组测序数据。测序方法的一个优点是可以评估新的罕见和低频变异，包括那些主要由患有所研究疾病的受试者携带的变异。尽管存在一个主要的统计挑战：在每个特定的罕见变异体上进行关联检验的能力较低，但用于关联罕见变异体的统计方法正在迅速发展。需要改进的统计方法来汇集遗传区域内罕见和常见变异的信息。我们正在开发基于功能方差分析框架的方法。与传统的方差分析（ANOVA）相比，其中固定的组平均值进行比较，方差函数分析（FANOVA）比较变化的函数，例如，组函数可能取决于时间。在遗传背景下，基因内变体的基因组位置起着时间的作用，即，作为函数的参数。函数值的方法有一些有吸引力的功能，包括平滑能力和固有的能力，以考虑遗传变异之间的连锁不平衡。FANOVA似乎是一个强大的方法。具体来说，我们比较了我们的扩展的FANOVA方法与其他已发表的方法的性能。我们发现，FANOVA比研究疾病模型的竞争方法具有更大的功效，包括大多数罕见变异是有害的以及具有保护性和有害变异混合的那些。我们将进一步改进功能方法，以提高其灵活性和能力。