权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistical And Computational Methods For Gene Expression and Proteomic Analysis

基因表达和蛋白质组分析的统计和计算方法

基本信息

批准号：
8148480
负责人：
peter j munson
金额：
$ 105.21万
依托单位：
CENTER FOR INFORMATION TECHNOLOGY
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/8148480
关键词：
Computing Methodologies Gene Expression Proteomics

项目摘要

Gene expression measurement using cDNA and oligo arrays continues to be a popular and useful technology for genomic analysis. High throughput methods for measuring protein concentrations are also increasing in popularity. One of the more challenging problems results from the large volume of data generated in these experiments. Quality control and experimental design remain important fundamental issues. Analysis techniques which account for complex array designs and minimize artifacts are required. Many problematic statistical and bioinformatics issues remain and are addressed in this project. New next generation sequencing techniques are becoming popular for RNA expression measurement (RNAseq). As with microarrays, a host of technical and quality control issues remain as challenges, in addition to the new statistical problems implied by the discrete measurement (counts) which are returned. We continue to develop new methods for analysis of alternative gene splicing, based on microarray platforms especially designed for the purpose, and more recently, using RNAseq. Two measurement platforms, the Affymetrix exon array and the ExonHit junction probe array are being studied. A major study of the effects of the cancer drug Topotecan, a topoisomerase inhibitor, has been completed and accepted as a publication. A special version of our analysis package, The MSCL Toolbox, was written for this study, namely the ExonSVD. This statistical technique was shown to be highly efficient at identifying genes undergoing alternative splicing, and was less susceptible to the false positives encountered with the earlier ExonANOVA method. For almost a decade, our group has functioned as the "statistical analysis core" for a high-volume microarray laboratory in CCMD/CC. All microarray studies by this group now pass through our analysis pipeline. We now also perform as the analysis core for the microarray core facility for the NHLBI, more than tripling the throughput of microarray studies into our database and pipeline. This "core" facility has generated more than a dozen new collaborative projects per year, in which our staff are primarily responsible for statistical analysis and interpretation of microarray data. The entire Framiningham Heart Survey SABRe project has begun to use this new technology, which increases the available transcriptional information by roughly a factor of 10, compared to standard expression arrays. This large project, which will eventually assay up to 5,000 samples, has now completed phase II, the case-control study, which our Lab is currently analyzing. The third phase (remainder of samples analyzed in high-throughput manner) has begun and should be completed in FY11. We are carefully monitoring statistical quality control for this study as it proceeds to analyze almost 200 samples per week. In combination with clinical and other laboratory data, this dataset will no doubt lead to major advances in the understanding of expression signatures and heart disease. The first, feasibility study analyzed samples from 50 individuals, with four blood derived sample types per individual; PBMC, lymphoblastoid cell lines, PaxGene tubes and buffy coat. The technical goal is to chose the best, or at least usable sample types for analysis in the larger study. The result shows that PBMC and PaxGene tubes are roughly equally good in the quality of results. PaxGene was chosen as the sample type for the next two phases. Affordable, high-quality software availability has been one of the bottlenecks in analysis of microarray data. We have continued development of the "MSCL Analyst's Toolbox" to address this need. Built upon the commercial statistical package JMP, this toolbox allows investigators to download Affymetrix microarray data from a central database, normalize and transform the data, inspect it for a variety of outliers or defects, perform a variety of statistical tests to select relevant genes affected in the experiment, and then visualize and classify various patterns of gene expression. Because our Toolbox is written in open source scripts, its statistical tests can be modified as needed to conform to novel or unique experimental designs. In collaboration with over forty investigators in CC, NHLBI, NIDCR and other ICs, this tool has been applied to several dozen microarray studies. One-day and two-day Toolbox training workshops are regularly presented on the NIH campus. In a major NIH-wide project, we maintain a database for storage, retrieval and analysis of Affymetrix microarrays, NIHAGCC. As part of this collaboration, we have created a data analysis pipeline and bioinformatics toolset, including both commercial and freely available software. The database currently stores information from over 8000 microarrays. Our downloadable tool set (MSCL Analyst's Toolbox) is now mature, widely tested and applied in numerous studies. Working with investigators in NCI, CC, NHLBI, NINDS, NIAID, NHGRI, NICHD, NIA, NIDDK, NIDA we have developed, customized and applied this software for the analysis of microarray based studies. We also maintain a quarterly-updated set of annotation files for use with Affymetrix data, in a format for convenient download and use by our collaborators. In another study with investigators in NEI, we identified a list of retinal pigment epithelium (RPE) "signature" genes, based on comparison of RPE gene expression to catalogs of gene expression levels in other tissues. This new RPE signature has proven extremely valuable when used in combination with recently completed GWAS studies of adult macular degeneration, as the coincidence of signature genes with loci implicated in the GWAS study was very high, further implicating the RPE tissue as the source of many problems possibly causative of macular degeneration. We are now investigating the properties of RNAseq, a method for more accurately assessing the transcriptome using next-generation sequencing technology. In one project, with investigators in NHGRI, we are assessing the reproducibility, both within subject, and within lane, of the methodology. This project has been extended to a comparison of expression in cells from individuals with or without cardiac calcification. In another, we have analyzed the transcriptome of rat pineal gland, both day and nightime, and rhesus superior chiasmatic nucleus. We have found a dramatic number of new unexpected differences as well as dozens of expression differences already known from microarray analysis. Indeed, about 50% of the "reads" generated in this study do not belong to well-document rat genes, and are presumably a result of novel transcription from portions of the genome not yet annotated. Further study has refined the list of unannotated, but controled regions to about 50 outstanding regions, likely producing non-coding RNAs (ncRNAs) some of which were found to be pseudo-genes of highly expressed genes. Interestingly, it is not the coding regions, but the control regions that are found, suggesting that the expression might have a role in control of the true gene itself.

基因表达测量使用cDNA和寡核苷酸阵列仍然是一个流行的和有用的技术，基因组分析。测定蛋白质浓度的高通量方法也越来越受欢迎。其中一个更具挑战性的问题来自于这些实验中产生的大量数据。质量控制和实验设计仍然是重要的基本问题。考虑到复杂的阵列设计和最小化工件的分析技术是必需的。许多有问题的统计和生物信息学问题仍然存在，并在这个项目中得到解决。