Statistical models for biological and technical variation in RNA sequencing

RNA 测序中生物和技术变异的统计模型

基本信息

  • 批准号:
    8722575
  • 负责人:
  • 金额:
    $ 30.78万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-09-01 至 2018-04-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Since the invention of microarrays, measuring genome-wide gene expression is one of the most common experiments performed by molecular biologists. Gene expression analysis is also widely used in clinical applications to discover the molecular architecture of disease or to develop prognostic and predictive signatures. RNA-sequencing (RNA-seq) has become the preferred technology for making expression measurements due to declining costs and because RNA-seq is flexible enough to measure expression in regions not previously annotated as genes and to measure the abundances of multiple transcripts for individual genes. Now that RNA-seq data can be collected inexpensively and processed in experiments with replicates, a major challenge is statistical modeling and interpretation of results from RNA-seq experiments. Our proposal will tackle three key practical challenges in RNA-seq data analysis: (1) estimation and removal of hidden artifacts, (2) statistical models for differential expression scanning that d not rely on annotation or assembly, and (3) robust statistical models to correct ambiguous, variable, and unidentifiable assemblies, with specific application to the most popular computational RNA-seq software, Cufflinks. The first aim extends our batch discovery and removal methods to RNA-sequencing data by modeling within gene and spatial dependence in expression estimates that lead to heavily biased artifact estimates and reduced power. The second aim develops a statistical framework for first identifying regions of differential expressio at base-pair resolution, then associating these regions with known genomic landmarks or annotation as a lightweight and accurate scanning approach. This approach builds on the most mature statistical methods for RNA-seq analysis but does not rely on annotation to define transcriptional units such as genes or exons, allowing for unbiased discovery of differential expression. The third aim develops a statistical normalization and analysis framework that addresses the most egregious artifacts and limitations of the inherently ambiguous transcript assembly process. We will work closely with the developers of the most popular RNA-seq assembly software, Cufflinks to integrate our developments into that software suite. By modeling variation across genes using functional regression and in the transcript assembly process using hierarchical models we will reduce the number of false positives and increase the reproducibility of alternative transcript differential expression results. The statistical methods we develop will e packaged in freely available open source software that is designed to interact with downstream Bioconductor packages for summarization and visualization such as IRanges or Genominator. The result of this proposal will be a modular, integrated pipeline for analyzing RNA-seq data from raw reads produced by the sequencing machine to easily summarized and visualized tables of robust, interpretable, and reproducible results - thereby increasing the number and range of applications of RNA-seq in molecular biology and medicine.
描述(由申请人提供): 自微阵列发明以来,测量全基因组基因表达是分子生物学家最常见的实验之一。基因表达分析也广泛用于临床应用,以发现疾病的分子结构或开发预后和预测特征。RNA测序(RNA-seq)已成为进行表达测量的首选技术,这是由于成本下降,并且因为RNA-seq足够灵活,可以测量先前未注释为基因的区域中的表达,并测量单个基因的多个转录本的丰度。既然RNA-seq数据可以廉价地收集并在重复实验中进行处理,那么一个主要的挑战是统计建模和解释RNA-seq实验的结果。我们的提案将解决RNA-seq数据分析中的三个关键实际挑战:(1)估计和去除隐藏的伪影,(2)不依赖于注释或组装的差异表达扫描的统计模型,以及(3)稳健的统计模型,以纠正模糊,可变和不可识别的组装,具体应用于最流行的计算RNA-seq软件Cufflinks。第一个目标是通过在基因和表达估计中的空间依赖性内建模,将我们的批量发现和去除方法扩展到RNA测序数据,从而导致严重偏倚的伪影估计和降低的功率。第二个目标是开发一个统计框架,用于首先在碱基对分辨率下识别差异表达区域,然后将这些区域与已知的基因组地标或注释相关联,作为轻量级和准确的扫描方法。这种方法建立在最成熟的RNA-seq分析统计方法的基础上,但不依赖于注释来定义转录单位,如基因或外显子,允许无偏见地发现差异表达。第三个目标是开发一个统计标准化和分析框架,解决最令人震惊的文物和固有的模糊转录组装过程的限制。我们将与最受欢迎的RNA-seq组装软件Cufflinks的开发人员密切合作,将我们的开发集成到该软件套件中。通过使用功能回归对基因间的变异进行建模,以及使用分层模型对转录本组装过程中的变异进行建模,我们将减少假阳性的数量,并增加替代转录本差异表达结果的再现性。我们开发的统计方法将打包在免费提供的开源软件中,该软件旨在与下游Bioconductor软件包进行交互,以进行汇总和可视化,如IRanges或Genominator。该提案的结果将是一个模块化的集成管道,用于分析来自测序机产生的原始读数的RNA-seq数据,以轻松总结和可视化的强大,可解释和可重现的结果表-从而增加RNA-seq在分子生物学和医学中的应用数量和范围。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jeffrey T. Leek其他文献

Tackling the widespread and critical impact of batch effects in high-throughput data
解决批效应在高通量数据中广泛且关键的影响
  • DOI:
    10.1038/nrg2825
  • 发表时间:
    2010-09-14
  • 期刊:
  • 影响因子:
    52.000
  • 作者:
    Jeffrey T. Leek;Robert B. Scharpf;Héctor Corrada Bravo;David Simcha;Benjamin Langmead;W. Evan Johnson;Donald Geman;Keith Baggerly;Rafael A. Irizarry
  • 通讯作者:
    Rafael A. Irizarry
Transparency and reproducibility in artificial intelligence
人工智能中的透明度和可重复性
  • DOI:
    10.1038/s41586-020-2766-y
  • 发表时间:
    2020-10-14
  • 期刊:
  • 影响因子:
    48.500
  • 作者:
    Benjamin Haibe-Kains;George Alexandru Adam;Ahmed Hosny;Farnoosh Khodakarami;Levi Waldron;Bo Wang;Chris McIntosh;Anna Goldenberg;Anshul Kundaje;Casey S. Greene;Tamara Broderick;Michael M. Hoffman;Jeffrey T. Leek;Keegan Korthauer;Wolfgang Huber;Alvis Brazma;Joelle Pineau;Robert Tibshirani;Trevor Hastie;John P. A. Ioannidis;John Quackenbush;Hugo J. W. L. Aerts
  • 通讯作者:
    Hugo J. W. L. Aerts
Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis
  • DOI:
    10.1186/s12859-016-1152-0
  • 发表时间:
    2016-08-10
  • 期刊:
  • 影响因子:
    3.300
  • 作者:
    Andrew E. Jaffe;Thomas Hyde;Joel Kleinman;Daniel R. Weinberger;Joshua G. Chenoweth;Ronald D. McKay;Jeffrey T. Leek;Carlo Colantuoni
  • 通讯作者:
    Carlo Colantuoni

Jeffrey T. Leek的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jeffrey T. Leek', 18)}}的其他基金

Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10598130
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10330636
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10654376
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
  • 批准号:
    9100338
  • 财政年份:
    2016
  • 资助金额:
    $ 30.78万
  • 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
  • 批准号:
    9244046
  • 财政年份:
    2016
  • 资助金额:
    $ 30.78万
  • 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
  • 批准号:
    8593469
  • 财政年份:
    2013
  • 资助金额:
    $ 30.78万
  • 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
  • 批准号:
    9264553
  • 财政年份:
    2013
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9978143
  • 财政年份:
    2011
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9304366
  • 财政年份:
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9759993
  • 财政年份:
  • 资助金额:
    $ 30.78万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Fellowship
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Continuing Grant
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Research Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30.78万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了