权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Prediction and Network Construction Using High-throughput Data

利用高通量数据进行预测和网络构建

基本信息

批准号：
8104045
负责人：
Ka Yee Yeung-Rhee
金额：
$ 45.8万
依托单位：
UNIVERSITY OF WASHINGTON
依托单位国家：
美国
项目类别：
财政年份：
2008
资助国家：
美国
起止时间：
2008-09-01 至 2013-06-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8104045
关键词：
Accounting Algorithms Bioconductor Biological Biological Markers Categories Classification Clinical Communities Computer software Data Data Analyses Data Set Development Diagnostic Diagnostic tests Disease Effectiveness Feedback Gene Deletion Gene Expression Generations Genes Genetic Goals Health Histocompatibility Testing Imagery Individual Information Networks Knowledge Lead Light Link Literature Malignant Neoplasms Mass Spectrum Analysis Measurement Measures Methodology Methods Microarray Analysis Modeling Molecular Profiling Outcome Pathway interactions Patients Peptides Phenotype Play Positioning Attribute Probability Process Property Protein Microchips Proteomics Relapse Relative (related person)Reporting Resources Role Saccharomyces cerevisiae Sampling Serum Signal Transduction Social Network Software Tools Statistical Methods Survival Analysis Technology Testing Time Tissue Sample Tissues Update Validation Visual Yeasts base biological systems clinically relevant computer based statistical methods diagnostic accuracy experience genome-wide high throughput technology improved interest method development network models novel outcome forecast prognostic protein profiling protein protein interaction research study software development software repository tool tumor web site

项目摘要

DESCRIPTION (provided by applicant): Biomarker identification is becoming an important use for high-throughput technologies like microarrays and mass spectrometry. These high-throughput data (especially microarray data) are used extensively for tissue type classification, including various tumor types, patient survival time prediction, time to relapse, and other clinically relevant temporal quantities. These high-throughput data measure the activity levels of thousands of potential predictors (genes in the case of gene expression data and peptides in the case of mass spectrometry or protein microarray data). The analysis of these data poses difficult statistical problems since the number of features measured is far larger than the number of tissue samples that are typically available. Moreover, many different sets of predictors produce similar prediction accuracies. Here, we propose to incorporate biological knowledge into a supervised framework to identify biologically meaningful predictors for classification and survival analysis. Towards this end, we will develop Bayesian Model Averaging (BMA) methods to produce simple, reliable, robust, and interpretable predictions. BMA also provides a probabilistic multivariate feature selection method. As part of this effort, we will extend the recently developed latent position cluster model for social networks to infer biological networks and identify network modules. Network properties (e.g., modules and the degree of connectivities) confer biological meanings. Hence, we will integrate network properties in a supervised framework to identify biologically meaningful predictors. We will extend the BMA methods to determine predictive network modules and pre-defined gene categories (e.g. GO categories, KEGG pathways). This proposal has two main computational thrusts: (1) the development of BMA methods for multi-class classification and survival analysis (Aim 1); and (2) the development of latent position cluster model for inferring biological networks and identifying network modules (Aim 3). These two computational thrusts are unified in Aim 2 in which we use network modules and properties in the supervised BMA framework. In Aim 4, we will generate expression perturbation data to evaluate our network construction methods. Finally, we will make the software and data generated publicly available. The methods developed in this proposal are generally applicable to many high-throughput data types. However, since we will generate expression perturbation data to validate and refine the constructed expression networks, we will focus on applying our developed methods to gene expression data. PUBLIC HEALTH RELEVANCE: Biomarker identification is becoming an important use for high-throughput technologies like microarrays. This proposal aims to identify biologically meaningful predictive biomarkers for tissue type classification, including various tumor types, patient survival time prediction, time to relapse, and other clinically relevant temporal quantities. This project could lead to inexpensive, accurate and robust diagnostic tests that increase the accuracy of diagnoses or prognoses for patients with cancer or other diseases.

描述（由申请人提供）：生物标志物鉴定正在成为高通量技术（如微阵列和质谱法）的重要用途。这些高通量数据（尤其是微阵列数据）被广泛用于组织类型分类，包括各种肿瘤类型、患者生存时间预测、复发时间和其他临床相关的时间量。这些高通量数据测量了数千个潜在预测因子的活性水平（在基因表达数据的情况下是基因，在质谱或蛋白质微阵列数据的情况下是肽）。这些数据的分析提出了困难的统计问题，因为测量的特征的数量远远大于通常可用的组织样本的数量。此外，许多不同的预测器集合产生类似的预测精度。在这里，我们建议将生物学知识纳入一个监督框架，以确定生物学上有意义的预测分类和生存分析。为此，我们将开发贝叶斯模型平均（BMA）方法，以产生简单，可靠，稳健和可解释的预测。BMA还提供了一种概率多元特征选择方法。作为这项工作的一部分，我们将扩展最近开发的潜在位置集群模型社交网络推断生物网络和识别网络模块。网络属性（例如，模块和连通性的程度）赋予生物学意义。因此，我们将在监督框架中整合网络属性，以识别具有生物学意义的预测因子。我们将扩展BMA方法来确定预测网络模块和预定义的基因类别（例如GO类别，KEGG途径）。该建议有两个主要的计算推力：（1）多类分类和生存分析的BMA方法的发展（目标1）;和（2）潜在的位置集群模型推断生物网络和识别网络模块的发展（目标3）。这两个计算推力在目标2中是统一的，在目标2中，我们使用监督BMA框架中的网络模块和属性。在目标4中，我们将生成表达式扰动数据来评估我们的网络构建方法。最后，我们将公开提供生成的软件和数据。本提案中开发的方法通常适用于许多高吞吐量数据类型。然而，由于我们将生成表达扰动数据来验证和改进构建的表达网络，因此我们将专注于将我们开发的方法应用于基因表达数据。公共卫生相关性：生物标志物识别正在成为高通量技术（如微阵列）的重要用途。该提案旨在确定用于组织类型分类的生物学上有意义的预测生物标志物，包括各种肿瘤类型、患者生存时间预测、复发时间和其他临床相关的时间量。该项目可能导致廉价，准确和强大的诊断测试，提高诊断或诊断癌症或其他疾病患者的准确性。