Multicollinearity in the statistical genomics era: Proposals to account for dependencies between molecular covariates with application to animal breeding

统计基因组学时代的多重共线性:解释分子协变量之间依赖性及其在动物育种中的应用的建议

基本信息

项目摘要

In animal breeding, molecular data (e.g. single nucleotide polymorphisms; SNPs) are incorporated as predictor variables in statistical models to reach an improved genomic evaluation of animals. This leads to more precisely estimated breeding values of not-yet phenotyped animals, which is important for breeding purposes, and enables the genetic architecture of some traits to be elucidated. Not only is the effect size relevant but also the position on the genome. Particularly as high-dimensional SNP data are available, a causative variant can be pinpointed to a specific base pair on the genome. As the number of model parameters increases with a still growing number of SNPs, multicollinearity between covariates can affect the results of whole-genome regression methods. The objective of this study is to additionally incorporate dependencies between the molecular covariates, which are due to the linkage and linkage disequilibrium among chromosome segments, for more accurate estimates of SNP effects. The theoretical covariance between SNP genotypes can be used to filter the whole set of SNPs in order to remain at less but representative predictor variables. Furthermore, a joint approach is proposed that allows the simultaneous selection and shrinkage of relevant predictors. It is hypothesised that this method fulfils the requirements of genomic evaluation: the dependencies between SNPs are considered, smooth estimates are obtained within groups of highly correlated SNPs and the solution is sparse among and also within these groups. Thus, genomic regions that affect a trait can be identified.
在动物育种中,分子数据(例如单核苷酸多态;SNPs)被纳入统计模型中作为预测变量,以达到改进动物基因组评估的目的。这导致更准确地估计尚未表型的动物的育种值,这对育种目的很重要,并使一些性状的遗传结构得以阐明。不仅与效应大小有关,而且与基因组上的位置有关。特别是当高维SNP数据可用时,致病变异可以精确地定位到基因组上的特定碱基对。随着模型参数数量的增加以及SNPs数量的不断增加,协变量之间的多重共线性会影响全基因组回归方法的结果。这项研究的目的是为了更准确地估计SNP效应,另外还包括由于染色体片段之间的连锁和连锁不平衡而导致的分子协变量之间的相关性。SNP基因型之间的理论协方差可用于筛选整个SNP集合,以便保持较少但具有代表性的预测变量。此外,还提出了一种允许同时选择和收缩相关预测器的联合方法。假设这种方法满足基因组评估的要求:考虑了SNPs之间的相关性,在高度相关的SNPs组内得到平滑的估计,并且在这些组之间和组内的解是稀疏的。因此,可以确定影响某一性状的基因组区域。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
  • DOI:
    10.1186/s12859-020-03725-w
  • 发表时间:
    2020-09-15
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Klosa, Jan;Simon, Noah;Wittenburg, Doerte
  • 通讯作者:
    Wittenburg, Doerte
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dr. Dörte Wittenburg其他文献

Dr. Dörte Wittenburg的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dr. Dörte Wittenburg', 18)}}的其他基金

The role of the theoretical covariance between SNPs in the design of experiments in genomic evaluations
SNP 之间的理论协方差在基因组评估实验设计中的作用
  • 批准号:
    320694892
  • 财政年份:
    2016
  • 资助金额:
    --
  • 项目类别:
    Research Grants

相似国自然基金

基于随机网络演算的无线机会调度算法研究
  • 批准号:
    60702009
  • 批准年份:
    2007
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Data Science and Statistics Core
数据科学和统计核心
  • 批准号:
    10549489
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Comprehensive and non-invasive prenatal screening of coding variation
全面、无创的编码变异产前筛查
  • 批准号:
    10678005
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Characterizing the genetic etiology of delayed puberty with integrative genomic techniques
利用综合基因组技术表征青春期延迟的遗传病因
  • 批准号:
    10663605
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Bayesian genetic association analysis of all rare diseases in the Kids First cohort
Kids First 队列中所有罕见疾病的贝叶斯遗传关联分析
  • 批准号:
    10643463
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Uncovering the Role of the MS4A Gene Family in Alzheimer's Disease
揭示 MS4A 基因家族在阿尔茨海默病中的作用
  • 批准号:
    10751885
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
New approaches for leveraging single-cell data to identify disease-critical genes and gene sets
利用单细胞数据识别疾病关键基因和基因集的新方法
  • 批准号:
    10768004
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Next-Generation Algorithms in Statistical Genetics Based on Modern Machine Learning
基于现代机器学习的下一代统计遗传学算法
  • 批准号:
    10714930
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Multi-modal insights of spatially distributed cells with associations of diseases and drug response
空间分布细胞与疾病和药物反应关联的多模式见解
  • 批准号:
    10714602
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Deep Learning Image Analysis Algorithms to Improve Oral Cancer Risk Assessment for Oral Potentially Malignant Disorders
深度学习图像分析算法可改善口腔潜在恶性疾病的口腔癌风险评估
  • 批准号:
    10805177
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Identifying structural variants influencing human health in population cohorts
识别影响人群健康的结构变异
  • 批准号:
    10889519
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了