Statistical models for biological and technical variation in RNA sequencing

RNA 测序中生物和技术变异的统计模型

基本信息

  • 批准号:
    8593469
  • 负责人:
  • 金额:
    $ 30.78万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-09-01 至 2018-04-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Since the invention of microarrays, measuring genome-wide gene expression is one of the most common experiments performed by molecular biologists. Gene expression analysis is also widely used in clinical applications to discover the molecular architecture of disease or to develop prognostic and predictive signatures. RNA-sequencing (RNA-seq) has become the preferred technology for making expression measurements due to declining costs and because RNA-seq is flexible enough to measure expression in regions not previously annotated as genes and to measure the abundances of multiple transcripts for individual genes. Now that RNA-seq data can be collected inexpensively and processed in experiments with replicates, a major challenge is statistical modeling and interpretation of results from RNA-seq experiments. Our proposal will tackle three key practical challenges in RNA-seq data analysis: (1) estimation and removal of hidden artifacts, (2) statistical models for differential expression scanning that d not rely on annotation or assembly, and (3) robust statistical models to correct ambiguous, variable, and unidentifiable assemblies, with specific application to the most popular computational RNA-seq software, Cufflinks. The first aim extends our batch discovery and removal methods to RNA-sequencing data by modeling within gene and spatial dependence in expression estimates that lead to heavily biased artifact estimates and reduced power. The second aim develops a statistical framework for first identifying regions of differential expressio at base-pair resolution, then associating these regions with known genomic landmarks or annotation as a lightweight and accurate scanning approach. This approach builds on the most mature statistical methods for RNA-seq analysis but does not rely on annotation to define transcriptional units such as genes or exons, allowing for unbiased discovery of differential expression. The third aim develops a statistical normalization and analysis framework that addresses the most egregious artifacts and limitations of the inherently ambiguous transcript assembly process. We will work closely with the developers of the most popular RNA-seq assembly software, Cufflinks to integrate our developments into that software suite. By modeling variation across genes using functional regression and in the transcript assembly process using hierarchical models we will reduce the number of false positives and increase the reproducibility of alternative transcript differential expression results. The statistical methods we develop will e packaged in freely available open source software that is designed to interact with downstream Bioconductor packages for summarization and visualization such as IRanges or Genominator. The result of this proposal will be a modular, integrated pipeline for analyzing RNA-seq data from raw reads produced by the sequencing machine to easily summarized and visualized tables of robust, interpretable, and reproducible results - thereby increasing the number and range of applications of RNA-seq in molecular biology and medicine.
描述(由申请人提供): 由于微阵列发明,测量全基因组基因表达是分子生物学家进行的最常见的实验之一。基因表达分析也广泛用于临床应用中,以发现疾病的分子结构或发展预后和预测性特征。 RNA-sequencing(RNA-Seq)已成为由于成本下降而进行表达测量的首选技术,并且因为RNA-Seq足够灵活,可以测量以前未注释为基因的区域的表达,并测量单个基因的多个转录本的丰富性。现在,可以廉价地收集RNA-seq数据并在重复的实验中处理,重大挑战是统计建模和解释RNA-Seq实验结果。我们的建议将在RNA-seq数据分析中应对三个关键的实际挑战:(1)估计和去除隐藏的伪像,(2)差异表达扫描的统计模型,d不依赖注释或组装,以及(3)稳健的统计模型,以纠正最受欢迎的计算机rnna-seq coffuff seq cofff confections confections confections confections confections confections conf。第一个目的将我们的批次发现和去除方法扩展到RNA序列数据,通过在基因和空间依赖性中进行建模,并在表达估计中进行空间依赖性,从而导致造成严重偏见的伪影估计和降低功率。第二个目标开发了一个统计框架,用于首先在基本对分辨率下首先识别差异表达区域,然后将这些区域与已知的基因组地标或注释相关联,将其作为一种轻巧且准确的扫描方法。这种方法基于用于RNA-Seq分析的最成熟的统计方法,但不依赖注释来定义转录单元,例如基因或外显子,从而可以无偏见地发现差异表达。第三个目标开发了一个统计归一化和分析框架,该框架解决了固有的模棱两可的成绩单组装过程的最严重的人工制品和局限性。我们将与最受欢迎的RNA-Seq组装软件的开发人员紧密合作,Cufflinks将开发项目集成到该软件套件中。通过使用函数回归和在转录本的组装过程中使用层次模型对基因进行建模,我们将减少假阳性的数量,并增加替代转录差分表达结果的可重复性。我们开发的统计方法将包装在可自由使用的开源软件中,该软件旨在与下游生物导体包相互作用,以进行汇总和可视化,例如IRAMENES或GENOMINATOR。该提案的结果将是一个模块化的集成管道,用于分析测序机产生的原始读取的RNA-seq数据,以易于汇总和可视化的鲁棒,可解释和可重复的结果表,从而增加了RNA-Seq在分子生物学和药物中的应用的数量和范围。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jeffrey T. Leek其他文献

Jeffrey T. Leek的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jeffrey T. Leek', 18)}}的其他基金

Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10598130
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10330636
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
  • 批准号:
    10654376
  • 财政年份:
    2022
  • 资助金额:
    $ 30.78万
  • 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
  • 批准号:
    9100338
  • 财政年份:
    2016
  • 资助金额:
    $ 30.78万
  • 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
  • 批准号:
    9244046
  • 财政年份:
    2016
  • 资助金额:
    $ 30.78万
  • 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
  • 批准号:
    9264553
  • 财政年份:
    2013
  • 资助金额:
    $ 30.78万
  • 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
  • 批准号:
    8722575
  • 财政年份:
    2013
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9978143
  • 财政年份:
    2011
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9304366
  • 财政年份:
  • 资助金额:
    $ 30.78万
  • 项目类别:
Core B
核心B
  • 批准号:
    9759993
  • 财政年份:
  • 资助金额:
    $ 30.78万
  • 项目类别:

相似国自然基金

时空序列驱动的神经形态视觉目标识别算法研究
  • 批准号:
    61906126
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
  • 批准号:
    41901325
  • 批准年份:
    2019
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
  • 批准号:
    61802133
  • 批准年份:
    2018
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
  • 批准号:
    61872252
  • 批准年份:
    2018
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目
针对内存攻击对象的内存安全防御技术研究
  • 批准号:
    61802432
  • 批准年份:
    2018
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Bayesian Statistical Learning for Robust and Generalizable Causal Inferences in Alzheimer Disease and Related Disorders Research
贝叶斯统计学习在阿尔茨海默病和相关疾病研究中进行稳健且可推广的因果推论
  • 批准号:
    10590913
  • 财政年份:
    2023
  • 资助金额:
    $ 30.78万
  • 项目类别:
Predicting firearm suicide in military veterans outside the VA health system using linked civilian electronic health record data
使用链接的民用电子健康记录数据预测退伍军人管理局卫生系统外退伍军人的枪支自杀
  • 批准号:
    10655968
  • 财政年份:
    2023
  • 资助金额:
    $ 30.78万
  • 项目类别:
Deep Learning Based Natural Language Processing Markers of Anxiety and Depression
基于深度学习的自然语言处理的焦虑和抑郁标记
  • 批准号:
    10723819
  • 财政年份:
    2023
  • 资助金额:
    $ 30.78万
  • 项目类别:
Fair risk profiles and predictive models for outcomes of obstructive sleep apnea through electronic medical record data
通过电子病历数据对阻塞性睡眠呼吸暂停结果进行公平的风险概况和预测模型
  • 批准号:
    10678108
  • 财政年份:
    2023
  • 资助金额:
    $ 30.78万
  • 项目类别:
Mining minority enriched AllofUs data for innovative ethnic specific risk prediction modeling
挖掘少数族裔丰富的 AllofUs 数据,用于创新的种族特定风险预测模型
  • 批准号:
    10798514
  • 财政年份:
    2023
  • 资助金额:
    $ 30.78万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了