BIGDATA: Low-Memory Streaming Prefilters for Biological Sequencing Data

BIGDATA:生物测序数据的低内存流预过滤器

基本信息

  • 批准号:
    8599821
  • 负责人:
  • 金额:
    $ 24.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-07-19 至 2016-05-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): We will soon be able to exhaustively sequence the DNA and RNA of entire communities of bacteria, as well as every individual cell of a tumor. Both of these very different applications of sequencing share in the need to rapidly and efficiently sor through large amounts of noisy sequence data (dozens to 100s of terabases) to separate signal from noise and produce biological insight. However, current bioinformatics approaches for extracting information from this data cannot easily handle the vast amounts of data being acquired. The primary challenges in processing this sequence data are twofold: the relatively high error rate of 0.1-1\%, per base, and the volume of data we can now easily acquire with sequencers such as lllumina HiSeq. For years, sequencing capacity has been doubling every 6 months -significantly faster than compute capacity. Since almost all extant bioinformatics analysis approaches require multiple passes across the primary data, and many analysis algorithms have not been parallelized, bioinformatics analysis capacity continues to lag ever further behind data generation capacity. In addition, many of the existing software packages cannot easily be retooled to take advantage of many core or GPU algorithms, and hence will not take advantage of expected advances in compute capacity and cyber infrastructure we propose to develop and implement novel streaming approaches for loss compression and error connection in shotgun sequencing data. Our algorithms are few-pass ($<$ 2), require no sample-specific information, and can be implemented in fixed or low memory; moreover, they are amenable to parallelization and can run efficiently in many core environments. When implemented as a prefilter to existing analysis packages, our approaches will eliminate or correct the majority of errors in data sets, dramatically reducing the computational space and time requirements for downstream analysis using existing packages. Moreover, we will provide novel capability by extending error correction approaches to mRNAseq and metagenomic data sets. Intellectual Merit: We will develop a range of algorithms for space- and time-efficient compression and error correction of short-read DNA and RNA sequence data. These strategies will substantially increase the scalability of many downstream analysis applications, ranging from community analysis of metagenomes to resequencing analysis of humans. We will provide analyses describing the tradeoffs between space and time efficiency and sensitivity, and deliver tested, documented reference implementations of our approaches that can be used by the community for practical evaluation and incorporation into analysis tools. Our approaches will significantly impact short-read sequence analysis by introducing efficient and effective streaming approaches to the two most common types of short-read analysis, mapping and assembly.
描述(由申请人提供):我们将很快能够对整个细菌群落以及肿瘤的每个细胞的DNA和RNA进行详尽的测序。这两种非常不同的测序应用都需要快速有效地分选大量有噪声的序列数据(数十到数百个端粒酶),以将信号与噪声分离并产生生物学见解。然而,目前用于从这些数据中提取信息的生物信息学方法不能容易地处理所获取的大量数据。 处理该序列数据的主要挑战是双重的:每个碱基0.1- 1%的相对高的错误率,以及我们现在可以用测序仪如Illumina HiSeq容易地获得的数据量。多年来,测序能力每6个月就翻一番,远远快于计算能力。由于几乎所有现存的生物信息学分析方法都需要对原始数据进行多次遍历,并且许多分析算法尚未并行化,因此生物信息学分析能力继续进一步落后于数据生成能力。此外,许多现有的软件包不能容易地重新调整以利用许多核心或GPU算法,因此不会利用我们提出的计算能力和网络基础设施的预期进步来开发和实施用于鸟枪测序数据中的损失压缩和错误连接的新的流传输方法。我们的算法是少通($<$2),不需要特定于样本的信息,并可以在固定或低内存中实现,而且,他们是服从并行化,可以在许多核心环境中有效地运行。当作为预过滤器实现到现有的分析包,我们的方法将消除或纠正数据集中的大多数错误,大大减少了使用现有的软件包下游分析的计算空间和时间要求。此外,我们将通过将纠错方法扩展到mRNAseq和宏基因组数据集来提供新的能力。智力优势:我们将开发一系列算法,用于短读段DNA和RNA序列数据的空间和时间有效的压缩和纠错。这些策略将大大提高许多下游分析应用的可扩展性,从宏基因组的社区分析到人类的重测序分析。我们将提供分析,描述空间和时间效率和敏感性之间的权衡,并提供经过测试的,有文档记录的参考实现,可供社区用于实际评估和纳入分析工具。我们的方法将通过将高效和有效的流方法引入两种最常见的短读段分析,映射和组装,从而显著影响短读段序列分析。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

C. Titus BROWN其他文献

C. Titus BROWN的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('C. Titus BROWN', 18)}}的其他基金

Tools and Workflows for Mining Genomic Data on Many Clouds
用于在许多云上挖掘基因组数据的工具和工作流程
  • 批准号:
    9559842
  • 财政年份:
    2017
  • 资助金额:
    $ 24.99万
  • 项目类别:
BIGDATA: Low-Memory Streaming Prefilters for Biological Sequencing Data
BIGDATA:生物测序数据的低内存流预过滤器
  • 批准号:
    8703739
  • 财政年份:
    2013
  • 资助金额:
    $ 24.99万
  • 项目类别:
Analyzing Next-Generation Sequencing Data
分析下一代测序数据
  • 批准号:
    8150859
  • 财政年份:
    2011
  • 资助金额:
    $ 24.99万
  • 项目类别:
Analyzing Next-Generation Sequencing Data
分析下一代测序数据
  • 批准号:
    8551251
  • 财政年份:
    2011
  • 资助金额:
    $ 24.99万
  • 项目类别:
Analyzing Next-Generation Sequencing Data
分析下一代测序数据
  • 批准号:
    8323321
  • 财政年份:
    2011
  • 资助金额:
    $ 24.99万
  • 项目类别:
Analyzing Next-Generation Sequencing Data
分析下一代测序数据
  • 批准号:
    8728301
  • 财政年份:
    2011
  • 资助金额:
    $ 24.99万
  • 项目类别:

相似国自然基金

Segmented Filamentous Bacteria激活宿主免疫系统抑制其拮抗菌 Enterobacteriaceae维持菌群平衡及其机制研究
  • 批准号:
    81971557
  • 批准年份:
    2019
  • 资助金额:
    65.0 万元
  • 项目类别:
    面上项目
电缆细菌(Cable bacteria)对水体沉积物有机污染的响应与调控机制
  • 批准号:
    51678163
  • 批准年份:
    2016
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目

相似海外基金

Cell Wall Formation in Rod Shaped Bacteria
杆状细菌细胞壁的形成
  • 批准号:
    BB/Y003187/1
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Research Grant
Did light dictate ancient diversification of phylogeny and cell structure in the domain bacteria?
光是否决定了细菌领域的古代系统发育和细胞结构的多样化?
  • 批准号:
    24H00582
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Conference: Symposium on the Immune System of Bacteria
会议:细菌免疫系统研讨会
  • 批准号:
    2349218
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Standard Grant
DNA replication dynamics in living bacteria
活细菌中的 DNA 复制动态
  • 批准号:
    23K25843
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
DYNBIOTICS - Understanding the dynamics of antibiotics transport in individual bacteria
DYNBIOTICS - 了解抗生素在单个细菌中转运的动态
  • 批准号:
    EP/Y023528/1
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Research Grant
NPBactID - Differential binding of peptoid functionalized nanoparticles to bacteria for identifying specific strains
NPBactID - 类肽功能化纳米粒子与细菌的差异结合,用于识别特定菌株
  • 批准号:
    EP/Y029542/1
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Fellowship
Assembly of the matrix that supports bacteria living in biofilms
支持生活在生物膜中的细菌的基质的组装
  • 批准号:
    2468773
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Studentship
Cell wall dynamics in Gram-positive bacteria
革兰氏阳性细菌的细胞壁动力学
  • 批准号:
    502580
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
BacNLR - Functional diversity of NLRs in multicellular bacteria
BacNLR - 多细胞细菌中 NLR 的功能多样性
  • 批准号:
    EP/Z000092/1
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Research Grant
Manipulating two-component systems to activate cryptic antibiotic pathways in filamentous actinomycete bacteria
操纵双组分系统激活丝状放线菌中的神秘抗生素途径
  • 批准号:
    BB/Y005724/1
  • 财政年份:
    2024
  • 资助金额:
    $ 24.99万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了