权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Nonparametric methods for functional and translational genomics

功能和翻译基因组学的非参数方法

基本信息

批准号：
8916814
负责人：
James Bentley Brown
金额：
$ 24.9万
依托单位：
UNIVERSITY OF CALIF-LAWRENC BERKELEY LAB
依托单位国家：
美国
项目类别：
财政年份：
2014
资助国家：
美国
起止时间：
2014-08-25 至 2017-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8916814
关键词：
Algorithms Animal Disease Models Animal Model Area Automobile Driving Award Base Pairing Biochemical Biological Biological Assay Biological Models Biological Process Cells ChIP-seq Communication Complementary DNA Complex Data Data Analyses Data Sources Development Developmental Biology Disease model Elements Gap Junctions Gene Deletion Genes Genome Genomics Goals High-Throughput Nucleotide Sequencing Human Human Biology Indium Individual Lead Link Maps Measures Mentors Methods Modeling Molecular Mutation Orphan Orthologous Gene Pathway Analysis Pharmaceutical Preparations Phenotype Play Problem Solving Property Protein Isoforms RNA Reading Research Research Personnel Running Semantics System Techniques Technology Toxic effect Training Training Activity Transcript Transcriptional Regulation Variant Weight abstracting analog base career development design driving force experience functional genomics high throughput screening human disease network models next generation sequencing novel strategies statistics stem cell biology theories tool transcription factor transcriptome sequencing

项目摘要

Project Summary / Abstract Next generation sequencing has revealed the molecular landscape of cells in unprecedented detail. However, for the massively large-scale data produced by assays based on these technologies, informativeness is not only a function of wet-lab technology, but is critically also a function of the analytical pipelines that interpret the data. Our group has developed four statistical tools designed maximize the informativeness of these assays: 1) the Genome Structural Correction (GSC), a nonparametric model of genomic annotations used to assess the significance of relationships between features; 2) the Irreproducible Discovery Rate (IDR), an analogue of the FDR that leverages information from biological replicates; 3) Statmap, a comprehensive analysis pipeline for ChIP-seq and CAGE data that propagates statistical confidence from base-calling to peak-calling; and 4) Sparse Linear Isoform Discovery and abundance Estimation (SLIDE), an integrative statistical framework for the analysis of RNA-seq, cDNA, and other RNA data aimed at obtaining and quantifying de novo transcript models. These tools are designed to identify and characterize functional elements in genomes; they make minimal assumptions about the data they analyze, and therefore draw reliable conclusions and measures of statistical confidence. During the K99, we will expand and integrate our tools to extend the reach of statistical confidence throughout data interpretatoin. During the R00, my research will progress toward the inference and assessment of biological networks. Just as ortholog identification has become an essential step in developing animal models of human disease, multi-species network analysis promises to become a key step in interpreting the relationship between genome variation and phenotype. Many mutations, even gene deletions, do not reveal an obvious phenotype. This is due to network robustness, which often differs between closely related species. To understand these phenomena, we aim to: 1) develop standard statistical tools for network inference, and 2) develop "meta models" of networks that will permit general measures of network orthology. These two aims are tightly linked: we will need critically to characterize the semantics of biological networks to model them. Currently, some models lack consistent definitions of edges and weights, resulting in untestable representations of genomics data. you've managed to have a relaxing weekend! We will develop testable, quantitative models of biological processes, establishing a uniform semantics leveraging the rich theory of complex systems. Each of the tools above will play a key role, especially Statmap and the GSC, which will be needed to propagate statistical confidence into network analysis. Advances will have a transformative effect on our ability to map animal models of disease onto human biology. Nearly nine out of ten new drugs fail in human trials due to issues (e.g. toxicity) not present in animal models. Understanding the orthology not just of individual genes, but of entire biochemical networks will be essential to infer and correct for differences between models of disease and human biology. Solving this problem will be a major step forward in the march from “base-pairs to bedside”.

项目总结/摘要下一代测序以前所未有的细节揭示了细胞的分子景观。然而，在这方面，对于基于这些技术的分析所产生的大规模数据，这不仅是湿实验室技术的一个功能，而且也是分析管道的一个功能，数据我们的团队开发了四种统计工具，旨在最大限度地提高这些检测的信息量： 1)基因组结构校正（GSC），用于评估基因组注释的非参数模型特征之间关系的重要性; 2）不可再现的发现率（IDR）， FDR利用来自生物复制的信息; 3）Statmap，一个全面的分析管道对于ChIP-seq和CAGE数据，其将统计置信度从碱基识别传播到峰识别;以及4）稀疏线性异构体发现和丰度估计（SLIDE），一个综合的统计框架，分析RNA-seq、cDNA和其他RNA数据，旨在获得和量化从头转录本模型这些工具旨在识别和表征基因组中的功能元件;它们使对他们分析的数据进行最少的假设，从而得出可靠的结论和衡量标准，统计置信度。在K99期间，我们将扩展和整合我们的工具，以扩大统计范围，在整个数据解释过程中的信心。在R 00期间，我的研究将朝着推理和评估生物网络。正如直系同源物鉴定已成为发展人类疾病的动物模型，多物种网络分析有望成为关键的一步，解释基因组变异和表型之间的关系。许多突变，甚至基因缺失，并没有表现出明显的表型。这是由于网络的鲁棒性，这往往是不同的密切相关物种为了理解这些现象，我们的目标是：1）开发网络标准统计工具推理，以及2）开发网络的“Meta模型”，这将允许网络正交的一般措施。这两个目标是紧密相连的：我们需要严格地描述生物网络的语义，模型他们。目前，一些模型缺乏对边和权重的一致定义，导致不可测试。基因组学数据的表示。你已经设法度过了一个轻松的周末！我们将开发可测试的，生物过程的定量模型，建立统一的语义，利用丰富的理论，复杂的系统上述每个工具都将发挥关键作用，特别是Statmap和GSC，需要将统计置信度传播到网络分析中。进步将产生变革性的影响将动物疾病模型映射到人类生物学上的能力。近十分之九的新药失败由于动物模型中不存在的问题（例如毒性）而进行的人体试验。理解正字法不仅仅是个体基因，而是整个生物化学网络，对于推断和纠正差异至关重要。疾病模型和人类生物学之间的联系解决这个问题将是前进的一大步从“碱基对到床边”。