权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly

快速 k-mer 计数可量化基因表达并改善基因组组装

基本信息

批准号：
8518438
负责人：
Carleton Lee Kingsford
金额：
$ 18.97万
依托单位：
CARNEGIE-MELLON UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2012
资助国家：
美国
起止时间：
2012-08-01 至 2016-04-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8518438
关键词：
Accounting Algorithmic Software Algorithms Animal Model Bioinformatics Biological Collection Complex Complex Mixtures Computational Technique Computer software Computers Computing Methodologies Data Data Analyses Development Disease Environment Etiology Gene Expression Genes Genome Genomics Goals Graph Hereditary Disease High-Throughput Nucleotide Sequencing Human Resources Jellyfish Length Location Maps Measures Memory Metagenomics Methods Morphologic artifacts Operating System Performance Positioning Attribute Procedures Process Protein Isoforms Publishing RNA RNA Sequences RNA Splicing Reading Relative (related person)Research Sampling Scheme Sequence Analysis Staging Structure Symbiosis Taxon Time Tissue-Specific Gene Expression Transcript Ursidae Family Variant Work base biological research data structure design exome sequencing improved insight metagenomic sequencing microbial community microorganism multicore processor next generation novel novel strategies processing speed research study response software development tool vector

项目摘要

DESCRIPTION (provided by applicant): We propose to investigate new computational approaches to two central problems of high-throughput se- quence analysis: (1) quantification of transcript and species abundance in RNAseq and metagenomic data, and (2) improved error correction of sequencing reads. The proposed novel approaches to both of these problems derive from the ability to quickly count every instance of every k-mer (string of length k) within huge collections of sequence data. Extensive preliminary work on this problem, manifest in the k-mer counting software (called Jellyfish) published by the project personnel, will be brought to bear and extended. Existing mapping-based computational techniques for quantifying transcript abundance have found wide applicability but read mapping is error prone due to, e.g., splice junctions, microexons, and variation from the reference sequence. Aim 1 seeks to develop an alternative, mapping-free approach to transcript quantification from sequencing data that relies on clustering normalized k-mer count vectors to identify k-mers that are indicative of transcript or gene abundance. These k-mers form profiles that can be used to rapidly quantify expression of the given transcript or gene in subsequent experiments with limited computational effort and avoiding the challenging read mapping step. Aim 2 tackles the problem of error correction of genomic, and, more speculatively, RNAseq reads by developing more accurate k-mer filtering methods and more compact de Bruijn graph representations. The new filtering proce- dures try to make a better distinction between correct and erroneous k-mers by simultaneously considering their position within the reads and the distribution of their quality scores across reads. Improved error correction and de Bruijn graph representations will be used for more efficient algorithms for super-read and unitig creation, the initial stages of assembly. The methods and software developed for both aims will significantly increase the ability of high-throughput sequence analysis and assembly to be completed on widely available commodity computers.

描述（由申请人提供）：我们提出研究新的计算方法来解决高通量序列分析的两个中心问题：（1）定量RNAseq和宏基因组数据中的转录本和物种丰度，以及（2）改进测序读数的纠错。针对这两个问题所提出的新方法都来自于在大量序列数据集合中快速计数每个k-mer（长度为k的字符串）的每个实例的能力。关于这个问题的广泛的初步工作，体现在项目人员出版的k-mer计数软件（称为Jellyfish），将承担和扩展。现有的用于定量转录本丰度的基于作图的计算技术已经发现了广泛的适用性，但是读段作图是容易出错的，这是由于，例如，剪接点、微外显子和与参考序列的变异。目的1寻求开发一种替代的、无映射的方法来从测序数据进行转录物定量，该方法依赖于聚类标准化的k聚体计数向量来鉴定指示转录物或基因丰度的k聚体。这些k-mer形成谱，其可用于在随后的实验中快速定量给定转录物或基因的表达，具有有限的计算工作量并避免具有挑战性的读段作图步骤。目标2通过开发更准确的k-mer过滤方法和更紧凑的de Bruijn图表示来解决基因组的错误校正问题，并且更推测地解决RNAseq读取的问题。新的过滤方法试图通过同时考虑它们在读段内的位置和它们在读段之间的质量分数的分布来更好地区分正确和错误的k聚体。改进的纠错和de Bruijn图表示将用于更有效的超级读取和unitig创建算法，即组装的初始阶段。为这两个目标开发的方法和软件将显着提高高通量序列分析和组装的能力，在广泛使用的商品计算机上完成。

项目成果

期刊论文数量（12）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

DOI：
10.1038/nbt.2862
发表时间：
2014-05
期刊：
NATURE BIOTECHNOLOGY
影响因子：
46.9
作者：
Patro, Rob;Mount, Stephen M.;Kingsford, Carl
通讯作者：
Kingsford, Carl

A computational method for designing diverse linear epitopes including citrullinated peptides with desired binding affinities to intravenous immunoglobulin.

DOI：
10.1186/s12859-016-1008-7
发表时间：
2016-04-08
期刊：
BMC bioinformatics
影响因子：
3
作者：
Patro R;Norel R;Prill RJ;Saez-Rodriguez J;Lorenz P;Steinbeck F;Ziems B;Luštrek M;Barbarini N;Tiengo A;Bellazzi R;Thiesen HJ;Stolovitzky G;Kingsford C
通讯作者：
Kingsford C

Diffusion archeology for diffusion progression history reconstruction.