权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Computational Methods to Characterize Alternative Splicing from Massive Collections of RNA-seq Data

从大量 RNA-seq 数据中表征选择性剪接的计算方法

基本信息

批准号：
10021689
负责人：
Liliana D Florea
金额：
$ 36.32万
依托单位：
JOHNS HOPKINS UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-20 至 2023-06-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10021689
关键词：
Alternative Splicing Big Data Bioinformatics Biological Models Bypass Capillary Electrophoresis Charge Collection Complex Computer software Computing Methodologies Data Data Collection Data Set Disease Drops Elements Environment Epigenetic Process Evaluation Studies Event Farming environment Follicular thyroid carcinoma Galaxy Gene Expression Genes Genotype-Tissue Expression Project Graph Head and Neck Cancer Health Heterogeneity High-Throughput Nucleotide Sequencing Human Human Biology Individual International Introns Learning Module Length Methods Methylation Modeling Morphologic artifacts Mutation Noise Pattern Performance Physiology Protein Isoforms Quantitative Reverse Transcriptase PCR RNA RNA Splicing RNA analysis Regulation Regulator Genes Reverse Transcriptase Polymerase Chain Reaction Role Sampling Sequence Alignment Signal Transduction Spliced Genes Statistical Methods Structure Surveys System Techniques Testing The Cancer Genome Atlas Thyroid carcinoma Tissues Transcript Variant base cell type cohort data reduction deep learning design experimental study feature detection feature selection heterogenous data human disease improved innovation next generation next generation sequencing novel open source programs prototype repository sample collection simulation tool transcriptome sequencing user-friendly virtual

项目摘要

SUMMARY Alternative splicing (AS) is a gene regulatory mechanism with important roles in human biology and disease. High throughput sequencing of RNA (RNA-seq) is making it possible to survey the expressed genes and their alternative splicing variations in a wide variety of cellular conditions. However, the short reads are challenging to analyze, demanding highly sophisticated computational methods that can extract meaningful AS information efficiently, accurately, and in a comprehensive way. While there has been great progress so far, current methods based on assembling the short reads into transcript annotations have reached a plateau. We propose two innovations that can help overcome the limits. The first is one-step simultaneous analyses of multiple samples in an RNA-seq collection, in contrast with the current two-step approach that analyzes each sample separately and then merges the results. The second is to create and interrogate assembly-free representations of AS. The project will design a suite of tools that will leverage the latent information in large collections of samples and from heterogeneous data types to build complete and accurate AS signatures of tissues and cell types, and to elucidate the regulatory circuitry of AS and its functional implications. Aim 1 will develop a high- performance multi-sample transcript assembly tool, combining subexon graph representations of genes and AS variations, statistical methods for improved feature detection, and search space reduction techniques for efficient sample processing. Aim 2 will build highly efficient and accurate feature selection tools to detect and characterize assembly-free AS variations (subexons and introns), simultaneously from collections of RNA-seq samples. It will combine novel regularized programs with complex models of intronic `noise' and other RNA-seq confounders, and enable analyses of differential splicing and to identify individual and group-specific variations. Lastly, Aim 3 will develop a system to comprehensively model the regulatory and functional circuitry of AS and the effects of mutations, starting from deep learning models of sequences and alignments and integrating expression, sequence, epigenetic and mutation data across tissues, cell types and conditions. We will rigorously test and evaluate all tools in simulations and on large public data sets, as well as on thyroid and head and neck cancer data provided by our collaborators, and we will experimentally validate random subsets of predictions with capillary electrophoresis and qRT-PCR. Collectively, the concepts, methods and tools will establish a new framework for analyzing RNA-seq data that can efficiently tackle the `big data' challenges, leading to more complete discovery and annotation of AS structure and function in human health and disease.

总结选择性剪接（AS）是一种基因调控机制，在人类生物学和疾病中具有重要作用。 RNA的高通量测序（RNA-seq）使得调查表达的基因及其表达水平成为可能。在多种细胞条件下的选择性剪接变异。然而，短读是具有挑战性的分析，需要高度复杂的计算方法，可以提取有意义的AS信息高效、准确、全面地进行。虽然迄今为止取得了很大进展，但目前基于将短读段组装成转录本注释的方法已经达到了稳定期。我们提出这两项创新可以帮助克服这些限制。第一种是一步同时分析多个与目前分析每个样本的两步法相比，然后将结果合并。第二个是创建和查询无装配表示的AS。该项目将设计一套工具，利用大量的潜在信息，样本和异构数据类型，以建立完整和准确的组织和细胞的AS签名类型，并阐明AS的调节回路及其功能意义。目标1将制定一个高- 高性能多样本转录本组装工具，结合基因的子外显子图表示， AS变化，用于改进特征检测的统计方法，以及用于高效的样品处理。Aim 2将构建高效准确的特征选择工具，同时从RNA-seq集合中表征无装配AS变异（子外显子和内含子）样品它将联合收割机与复杂的内含子“噪音”和其他RNA序列模型相结合，混杂因素，并使差异剪接的分析和鉴定个体和组特异性变异成为可能。最后，目标3将开发一个系统，全面模拟AS的调节和功能电路，突变的影响，从序列和比对的深度学习模型开始，跨组织、细胞类型和条件的表达、序列、表观遗传和突变数据。我们将严格测试和评估模拟和大型公共数据集的所有工具，以及甲状腺和我们的合作者提供的头颈部癌症数据，我们将通过实验验证随机子集毛细管电泳和qRT-PCR的预测。这些概念、方法和工具将共同建立分析RNA-seq数据的新框架，以有效应对“大数据”挑战，从而更完整地发现和注释AS结构和在人类健康和疾病中的功能。