Algorithms and Software for Provably Accurate De Novo RNA-Seq Assembly

用于可证明准确的 De Novo RNA-Seq 组装的算法和软件

基本信息

  • 批准号:
    9145263
  • 负责人:
  • 金额:
    $ 45.88万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-09-16 至 2018-06-30
  • 项目状态:
    已结题

项目摘要

 DESCRIPTION (provided by applicant): RNA-Seq has revolutionized transcriptomics and is one of the most important high-throughput sequencing assays invented in recent years. The key computational problem is that of de novo assembly: the reconstruction of the transcripts and their abundances from tens to hundreds of millions of short reads. The problem is challenging due to a confluence of several factors: large number of different transcripts (tens of thousands), long repeat across transcripts due to alternative splicing, widely varying abundances across transcripts, and the presence of read errors. Existing assemblers are mostly designed based on heuristic considerations and implement ad hoc methods that lead to unreliable transcriptome reconstructions. An accurate RNA-Seq assembler would enable more accurate identification of fusions in cancer transcriptomes, better gene annotations in model and non-model organisms, and more complete analyses of the dynamics of alternative splicing driving developmental and regulatory programs. In this proposal, we offer a systematic approach to the design of RNA-Seq assemblers based on information theoretic principles. We start by determining conditions data that guarantee that there enough information to reconstruct the transcriptome, and then propose an assembly algorithm that can reconstruct with the minimal information. This algorithm optimally uses the available read information to resolve repeats and disambiguate isoforms. A key insight derived from the information theoretic approach is that widely varying abundances across transcripts, rather than a complication, can actually be exploited as signatures of different transcripts to disambiguate among them. Based on our initial ideas, we have built, evaluated and compared an initial prototype with several existing software, on both real and simulated data. The encouraging results provide evidence that our approach, which we will fully develop, implement and evaluated during the funded period, can significantly outperform existing software. Additional functionalities such as mixed short/long read assembly, genome-assisted assembly and joint processing of multiple RNA samples, will be designed and incorporated into the software as part of the proposed project.
 描述(由申请人提供):RNA-Seq彻底改变了转录组学,是近年来发明的最重要的高通量测序测定之一。关键的计算问题是从头组装:从数千万到数亿个短读段重建转录本及其丰度。这个问题是具有挑战性的,由于几个因素的汇合:大量的不同的转录本(数万),长重复跨转录本由于选择性剪接,广泛变化的丰度跨转录本,和存在的读取错误。现有的组装器大多是基于启发式考虑而设计的,并实施导致转录组重建不可靠的临时方法。准确的RNA-Seq组装器将能够更准确地识别癌症转录组中的融合,更好地注释模型和非模型生物体中的基因,以及更完整地分析驱动发育和调控程序的选择性剪接的动态。在这个建议中,我们提供了一个系统的方法来设计RNA-Seq汇编器的基础上信息理论的原则。我们首先确定条件数据,保证有足够的信息来重建转录组,然后提出了一个组装算法,可以用最少的信息重建。该算法最佳地使用可用的读段信息来解析重复序列并消除异构体的歧义。从信息论方法中得出的一个关键见解是,不同转录本之间的丰度差异很大,而不是复杂化,实际上可以被利用为不同转录本的签名,以消除它们之间的歧义。根据我们的初步想法,我们已经建立,评估和比较了一个初始原型与现有的几个软件,对真实的和模拟数据。令人鼓舞的结果证明,我们的方法,我们将充分开发,实施和评估在资助期间,可以显着优于现有的软件。其他功能,如混合短/长读段组装,基因组辅助组装和多个RNA样本的联合处理,将被设计并纳入软件,作为拟议项目的一部分。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sreeram Kannan其他文献

Sreeram Kannan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sreeram Kannan', 18)}}的其他基金

Defining causal roles of genomic variants on gene regulatory networks with spatiotemporally-resolved single-cell multiomics
通过时空解析的单细胞多组学定义基因组变异对基因调控网络的因果作用
  • 批准号:
    10297331
  • 财政年份:
    2021
  • 资助金额:
    $ 45.88万
  • 项目类别:
Defining causal roles of genomic variants on gene regulatory networks with spatiotemporally-resolved single-cell multiomics
通过时空解析的单细胞多组学定义基因组变异对基因调控网络的因果作用
  • 批准号:
    10474569
  • 财政年份:
    2021
  • 资助金额:
    $ 45.88万
  • 项目类别:
Algorithms and Software for Provably Accurate De Novo RNA-Seq Assembly
用于可证明准确的 De Novo RNA-Seq 组装的算法和软件
  • 批准号:
    9624586
  • 财政年份:
    2015
  • 资助金额:
    $ 45.88万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 45.88万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 45.88万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了