权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interpretable Computational Models of Functional Genomics Data

功能基因组数据的可解释计算模型

基本信息

批准号：
10453055
负责人：
Peter K Koo
金额：
$ 41.73万
依托单位：
COLD SPRING HARBOR LABORATORY
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-09-07 至 2027-06-30
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10453055
关键词：
5&apos Splice Site Acceleration Address Alternative Splicing Bayesian Analysis Biological Biological Assay Biological Process Biological Sciences Catalogs ChIP-seq Code Communities Complex Computer Models Computer software Computing Methodologies DNA DNA Sequence Data Dependence Development Disease Elements Exhibits Genetic Genetic Transcription Genome Genomics Goals Individual Knowledge Laboratories Learning Maps Methods Modeling Modernization Mutagenesis Network-based Nucleotides Performance Positioning Attribute RNA RNA-Binding Proteins Regulatory Element Resolution Sequence Analysis Specific qualifier value Training Transcriptional Regulation Translating Variant Weight Work base computerized tools convolutional neural network crosslinking and immunoprecipitation sequencing density design direct application experimental study functional genomics genome-wide genomic data genomic locus human disease improved in silico in vivo insight learning network machine learning method multiplex assay neural network architecture open source prototype syntax transcription factor user-friendly

项目摘要

PROJECT SUMMARY Understanding how the coordination of cis-regulatory elements (CREs) influences biological processes, such as transcription and alternative splicing, is a major goal in computational genomics. This remains a challenge because CRE activity at any given locus may depend on a host of other factors, including sequence context and/or the presence of other CREs nearby. Recent developments in deep convolutional neural networks (CNNs) have revolutionized our ability to predict regulatory functions from DNA sequence. Unlike previous computational methods based on position-weight matrices, which capture an additive model of CREs, CNNs can, in principle, also learn higher-order dependencies within the CRE, with other CREs, and with the broader sequence context. However, CNNs are essentially black box models, with parameters that don’t have clear biological meaning. Hence it remains a challenge to translate the improved predictions of a CNN to new biological insights. Here we propose to develop three different computational methods that can comprehensively characterize higher-order interactions within CREs and across different CREs from functional genomics data, specifically ChIP-seq and CLIP-seq data publicly available through ENCODE. Each method serves as its own separate Aim and will be developed in parallel. In Aim 1, we will develop a new post hoc model interpretability method based on employing interpretable quantitative models originally developed to understand complex genetic interactions in laboratory- based comprehensive mutagenesis (e.g. multiplex assays of variant effects) to characterize CRE dependencies learned by a CNN, using synthetic sequences to target specific biological hypotheses. In Aim 2, we will develop new CNN architectures where the learned parameters will express higher-order interactions that have direct biological interpretations. In Aim 3, we will combine a Bayesian nonparametric framework for modeling CREs with CNN-based CRE annotations and GPU acceleration to develop new methods for understanding how CREs are specified in the genome. Successful completion of these Aims will provide a leap forward in our understanding of higher-order CRE dependencies that are exploited but have not yet been fully revealed by CNNs. This work will provide the community with: (1) a new suite of open-source computational tools that address the problem of modeling CREs and their dependencies in functional genomics data; and (2) a comprehensive genome-wide catalogue of CRE syntax for transcription factors and RNA-binding proteins that will be hosted on a user-friendly webserver.

项目总结