权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Design, prediction, and prioritization of systematic perturbations of the human genome

人类基因组系统扰动的设计、预测和优先级排序

基本信息

批准号：
10665666
负责人：
ANDREW S ALLEN
金额：
$ 72.98万
依托单位：
DUKE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-01 至 2026-05-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10665666
关键词：
Address Biological Assay Catalogs Clustered Regularly Interspaced Short Palindromic Repeats Code Computational Biology Coupled Coupling Data Diagnostic Disease End Point Assay Epigenetic Process Evolution Experimental Designs Explosion Gene Expression Gene Expression Regulation Genetic Genetic Code Genetic Determinism Genetic Variation Genome Genomic Segment Goals Health Human Human Genome Incidence Link Machine Learning Measures Methods Modeling Mutagenesis Neural Network Simulation Outcome Output Parameter Estimation Pathogenicity Patients Phenotype Population Population Genetics Prevention Recommendation Regulatory Element Risk Sample Size Site Software Tools Statistical Models Testing Untranslated RNA Update Variant cell type combinatorial deep neural network design experience experimental study flexibility gene discovery genetic variant genome sequencing health determinants human disease improved individual patient machine learning method machine learning model member models and simulation novel predictive modeling statistical and machine learning success tool whole genome

项目摘要

ABSTRACT Noncoding genetic variation that alters gene regulation is of paramount importance for health, disease, and evolution. Diseases ranging in incidence from the most common to the most rare all have substantial risk associated with regulatory variation; and most of the genetic differences between closely related species are noncoding. Whole genome sequencing can directly identify that variation but to realize its potential to elucidate the genetic determinants of health and disease, will require accurate annotation of this noncoding variation for functionality. In coding sequence, the genetic code allows variants to be annotated to a rough hierarchy of likely functional effects and pathogenicity. In noncoding sequence such annotation is less clear. Perturbation assays, i.e., assays that modify genetic or epigenetic states and measure the effect of those perturbations on regulatory endpoints, offer a possible path to annotating noncoding variation. However, to fully leverage this data, novel and sophisticated statistical and machine learning approaches are required to extract useful information from those assays, to integrate that information across regulatory endpoints, and to extrapolate findings so that annotation of previously unobserved (unperturbed) variation in diverse cell types is possible. The goal of the Duke Prediction Center is to develop the analytic approaches and tools that will allow for the routine annotation of noncoding variation for functionality and ultimately pathogenicity. Aim 1 is to establish best practices in perturbation assay design and analysis. This will allow IGVF characterization centers design their experiments so that, when coupled with optimized analyses, the data produced will be maximally informative for subsequent predictive modeling. Aim 2 is to develop novel mechanistic machine learning approaches for predicting the functional effect of noncoding variation on function in diverse cell-types. Aim 3 is to identify noncoding genomic regions that are subject to functional constraint which will be leveraged in prioritizing variants for pathogenicity. The expected outcomes of this project will be (i) robust estimates of optimal experimental design parameters and recommendations for analysis tools and best practices for the various assays used within the IGVF consortium, (ii) predicted functional effects of observed variation to be shared through the IGVF variant/phenotype catalog as well as a state-of-the-art machine learning method (and associated tools) that can identify previously-unknown interactions among genomic variants, both observed and novel, and predict their functional impact in diverse cell types, and (iii) a list of regulatory elements subject to functional constraint shared through the IGVF variant/phenotype catalog and a principled prioritization framework (and associated tools) for interpreting variation within patient genomes for pathogenicity. Due to the considerable success of genetics, there are thousands of unknown regulatory causes of disease. Each of those causes is an opportunity to improve treatment, diagnostics, or prevention. This project will be a major advance towards unlocking that potential.

摘要改变基因调控的非编码遗传变异对健康、疾病和进化论。发病率从最常见到最罕见的疾病都有很大的风险与调控变异有关；密切相关物种之间的大部分遗传差异是非编码。全基因组测序可以直接识别这种变异，但要实现它的潜力来阐明健康和疾病的遗传决定因素，需要对这种非编码变异进行准确的注释功能性。在编码序列中，遗传密码允许变体被注释到大致的可能的层次结构功能效应和致病性。在非编码序列中，这样的注释不太清楚。微扰分析，即，修改遗传或表观遗传状态并测量这些扰动对调控的影响的分析端点为注释非编码变体提供了一条可能的途径。然而，为了充分利用这些数据，新颖的需要复杂的统计和机器学习方法才能从这些分析，以整合跨监管终端的信息，并推断结果，以便注释以前未观察到的(未受干扰的)不同细胞类型的变异是可能的。的目标是杜克预测中心将开发分析方法和工具，以使例行程序对功能和最终致病性的非编码变异的注释。目标1是建立最好的扰动分析设计和分析的实践。这将允许IGVF表征中心设计他们的实验，当与优化分析相结合时，产生的数据将最大限度地为随后的预测性建模。目标2是开发新的机械机器学习方法，用于预测非编码变异对不同细胞类型功能的影响。目标3是确定受功能限制的非编码基因组区域，这些限制将被用于区分变体的优先顺序致病性。该项目的预期结果将是(I)对最优实验的稳健估计中使用的各种分析工具和最佳实践的设计参数和建议 IGVF联盟，(2)预测了观测到的变化的功能影响将通过IGVF分享变种/表型目录以及最先进的机器学习方法(和相关工具)，可以识别以前未知的基因组变异之间的相互作用，包括观察到的和新的，并预测它们的不同细胞类型的功能影响，以及(Iii)共享的受功能约束的调控要素列表通过IGVF变种/表型目录和有原则的优先排序框架(及相关工具) 解释患者基因组中的致病性变异。由于遗传学的巨大成功，导致疾病的原因有成千上万种未知的调控因素。这些原因中的每一个都是一个改进的机会治疗、诊断或预防。这个项目将是释放这一潜力的重大进步。