Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly

快速 k-mer 计数可量化基因表达并改善基因组组装

基本信息

  • 批准号:
    8518438
  • 负责人:
  • 金额:
    $ 18.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-08-01 至 2016-04-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): We propose to investigate new computational approaches to two central problems of high-throughput se- quence analysis: (1) quantification of transcript and species abundance in RNAseq and metagenomic data, and (2) improved error correction of sequencing reads. The proposed novel approaches to both of these problems derive from the ability to quickly count every instance of every k-mer (string of length k) within huge collections of sequence data. Extensive preliminary work on this problem, manifest in the k-mer counting software (called Jellyfish) published by the project personnel, will be brought to bear and extended. Existing mapping-based computational techniques for quantifying transcript abundance have found wide applicability but read mapping is error prone due to, e.g., splice junctions, microexons, and variation from the reference sequence. Aim 1 seeks to develop an alternative, mapping-free approach to transcript quantification from sequencing data that relies on clustering normalized k-mer count vectors to identify k-mers that are indicative of transcript or gene abundance. These k-mers form profiles that can be used to rapidly quantify expression of the given transcript or gene in subsequent experiments with limited computational effort and avoiding the challenging read mapping step. Aim 2 tackles the problem of error correction of genomic, and, more speculatively, RNAseq reads by developing more accurate k-mer filtering methods and more compact de Bruijn graph representations. The new filtering proce- dures try to make a better distinction between correct and erroneous k-mers by simultaneously considering their position within the reads and the distribution of their quality scores across reads. Improved error correction and de Bruijn graph representations will be used for more efficient algorithms for super-read and unitig creation, the initial stages of assembly. The methods and software developed for both aims will significantly increase the ability of high-throughput sequence analysis and assembly to be completed on widely available commodity computers.
描述(由申请人提供):我们提出研究新的计算方法来解决高通量序列分析的两个中心问题:(1)定量RNAseq和宏基因组数据中的转录本和物种丰度,以及(2)改进测序读数的纠错。针对这两个问题所提出的新方法都来自于在大量序列数据集合中快速计数每个k-mer(长度为k的字符串)的每个实例的能力。关于这个问题的广泛的初步工作,体现在项目人员出版的k-mer计数软件(称为Jellyfish),将承担和扩展。现有的用于定量转录本丰度的基于作图的计算技术已经发现了广泛的适用性,但是读段作图是容易出错的,这是由于,例如,剪接点、微外显子和与参考序列的变异。目的1寻求开发一种替代的、无映射的方法来从测序数据进行转录物定量,该方法依赖于聚类标准化的k聚体计数向量来鉴定指示转录物或基因丰度的k聚体。这些k-mer形成谱,其可用于在随后的实验中快速定量给定转录物或基因的表达,具有有限的计算工作量并避免具有挑战性的读段作图步骤。目标2通过开发更准确的k-mer过滤方法和更紧凑的de Bruijn图表示来解决基因组的错误校正问题,并且更推测地解决RNAseq读取的问题。新的过滤方法试图通过同时考虑它们在读段内的位置和它们在读段之间的质量分数的分布来更好地区分正确和错误的k聚体。改进的纠错和de Bruijn图表示将用于更有效的超级读取和unitig创建算法,即组装的初始阶段。为这两个目标开发的方法和软件将显着提高高通量序列分析和组装的能力,在广泛使用的商品计算机上完成。

项目成果

期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
  • DOI:
    10.1038/nbt.2862
  • 发表时间:
    2014-05
  • 期刊:
  • 影响因子:
    46.9
  • 作者:
    Patro, Rob;Mount, Stephen M.;Kingsford, Carl
  • 通讯作者:
    Kingsford, Carl
A computational method for designing diverse linear epitopes including citrullinated peptides with desired binding affinities to intravenous immunoglobulin.
  • DOI:
    10.1186/s12859-016-1008-7
  • 发表时间:
    2016-04-08
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Patro R;Norel R;Prill RJ;Saez-Rodriguez J;Lorenz P;Steinbeck F;Ziems B;Luštrek M;Barbarini N;Tiengo A;Bellazzi R;Thiesen HJ;Stolovitzky G;Kingsford C
  • 通讯作者:
    Kingsford C
Diffusion archeology for diffusion progression history reconstruction.
  • DOI:
    10.1007/s10115-015-0904-x
  • 发表时间:
    2016-11
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Sefer, Emre;Kingsford, Carl
  • 通讯作者:
    Kingsford, Carl
Salmon provides fast and bias-aware quantification of transcript expression.
  • DOI:
    10.1038/nmeth.4197
  • 发表时间:
    2017-04
  • 期刊:
  • 影响因子:
    48
  • 作者:
    Patro R;Duggal G;Love MI;Irizarry RA;Kingsford C
  • 通讯作者:
    Kingsford C
Predicting protein interactions via parsimonious network history inference.
通过简约的网络历史推断来预测蛋白质相互作用。
  • DOI:
    10.1093/bioinformatics/btt224
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Patro,Rob;Kingsford,Carl
  • 通讯作者:
    Kingsford,Carl
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Carleton Lee Kingsford其他文献

Carleton Lee Kingsford的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Carleton Lee Kingsford', 18)}}的其他基金

Improved genomic sketching for MUMmer and metagenomics
改进了 MUMmer 和宏基因组的基因组草图
  • 批准号:
    10453031
  • 财政年份:
    2022
  • 资助金额:
    $ 18.97万
  • 项目类别:
Improved genomic sketching for MUMmer and metagenomics
改进了 MUMmer 和宏基因组的基因组草图
  • 批准号:
    10670162
  • 财政年份:
    2022
  • 资助金额:
    $ 18.97万
  • 项目类别:
Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments
数据发现:搜索短读测序实验的计算方法
  • 批准号:
    9287168
  • 财政年份:
    2017
  • 资助金额:
    $ 18.97万
  • 项目类别:
Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments - Administrative Supplement
数据发现:搜索短读测序实验的计算方法 - 行政补充
  • 批准号:
    10393953
  • 财政年份:
    2017
  • 资助金额:
    $ 18.97万
  • 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
  • 批准号:
    8739540
  • 财政年份:
    2013
  • 资助金额:
    $ 18.97万
  • 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
  • 批准号:
    8579049
  • 财政年份:
    2013
  • 资助金额:
    $ 18.97万
  • 项目类别:
Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly
快速 k-mer 计数可量化基因表达并改善基因组组装
  • 批准号:
    8642468
  • 财政年份:
    2012
  • 资助金额:
    $ 18.97万
  • 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
  • 批准号:
    8072578
  • 财政年份:
    2010
  • 资助金额:
    $ 18.97万
  • 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
  • 批准号:
    7772829
  • 财政年份:
    2010
  • 资助金额:
    $ 18.97万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 18.97万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 18.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了