Improved genomic sketching for MUMmer and metagenomics

改进了 MUMmer 和宏基因组的基因组草图

基本信息

  • 批准号:
    10453031
  • 负责人:
  • 金额:
    $ 48.44万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-07-22 至 2026-05-31
  • 项目状态:
    未结题

项目摘要

PROJECT SUMMARY Increasing the efficiency of computational methods has been instrumental to extracting insight from genomic data. Fast aligners such as MUMMER, fast k-mer counters such as JELLYFISH, fast expression quantifiers such as SAILFISH and SALMON, and high-quality efficient genome assemblers such as MASURCA have been crucial to unlocking the potential of genomic and metagenomic data. Nevertheless, computation remains a time and cost bottleneck in many application areas. Algorithmic sketching methods, such as the minimizer schemes, have been a useful technique for achieving improved computational efficiency. However, despite their importance, these sketching techniques are understudied from a theoretical perspective and underused from a practical perspective. We propose to design, implement, test, and validate new sketching approaches based on significant extensions to the successful minimizers sketching schemes, greatly increasing the flexibility of these approaches and ex- panding their use into new areas including handling high-variance or highly repetitive sequences, and providing a new, standard sketching toolkit for genomic method designers and software implementors. These extensions, collectively referred to as marker selection schemes, will enable faster alignment, clustering, and assembly of genomic sequences, and will spur further computational innovation in genomic applications. To inform and validate this algorithmic work, we propose to enhance three important and broad areas of genomic computational methods. First, we will extend the widely-used MUMMER aligner with a number of application- specific “modes” that exploit these new and existing sketching schemes to achieve enhanced efficiency and greater sensitivity. This will ensure continued development and enhancement for additional applications of this important computational tool. Second, we will enhance the MASURCA genome assembler with updated in- tegration with the new MUMMER. Third, we will use the developed marker selection schemes and additional algorithmic ideas based on geometric embedding of sequences to develop more accurate, fast estimators of distances between genomic sequences. These approximate distance estimators are essential for a number of metagenomic applications including species classification, clustering, and search. We will advance the compu- tational accuracy of these tasks through these improved estimators. This project will result in a deeper toolbox of genomic sketching and distance estimation algorithms, software libraries encoding these new algorithms for wider use by the community, and an improved suite of genomic software, including enhancements to a widely used aligner and assembler and improved accuracy in existing and new metagenomic software.
项目摘要 提高计算方法的效率有助于从基因组中提取洞察力, 数据快速比对器如MUMMER,快速k-mer计数器如JELLYFISH,快速表达定量器如 例如SAILFISH和SALMON,以及高质量的高效基因组组装器,如MASURCA, to unlock解锁the potential潜力of genomic基因组and metagenomic宏基因组data数据.然而,计算仍然是一个时间, 许多应用领域的成本瓶颈。数学素描方法,如最小化方案, 这是一种提高计算效率的有用技术。然而,尽管它们很重要, 这些素描技巧从理论的角度研究不足,从实践的角度使用不足。 perspective. 我们建议设计,实施,测试和验证基于重要扩展的新草图方法 成功的最低限度者草图计划,大大增加了这些方法的灵活性和ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex-ex- 将其应用扩展到新的领域,包括处理高变异或高度重复的序列,并提供 一个新的,标准的草图工具包基因组方法的设计者和软件实现者。这些扩展, 统称为标记选择方案,将能够更快地比对、聚类和组装 基因组序列,并将刺激基因组应用中的进一步计算创新。 为了告知和验证这种算法工作,我们建议加强基因组学的三个重要而广泛的领域, 计算方法首先,我们将扩展广泛使用的MUMMER对准器的一些应用- 具体的"模式",利用这些新的和现有的草图计划,以实现提高效率, 更高的敏感性。这将确保持续开发和增强其其他应用程序 重要的计算工具。第二,我们将增强MASURCA基因组组装器,更新- 与新的MUMMER融合。第三,我们将使用已开发的标记选择方案和额外的 基于序列的几何嵌入的算法思想,以开发更准确,快速的 基因组序列之间的距离。这些近似距离估计器对于许多 宏基因组应用,包括物种分类,聚类和搜索。我们将推进计算机- 通过这些改进的估计器来提高这些任务的计算精度。 这个项目将导致基因组草图和距离估计算法,软件,更深的工具箱 编码这些新算法的库可供社区更广泛使用,以及一套改进的基因组 软件,包括对广泛使用的对准器和汇编器的增强,以及对现有的 和新的宏基因组软件

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Carleton Lee Kingsford其他文献

Carleton Lee Kingsford的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Carleton Lee Kingsford', 18)}}的其他基金

Improved genomic sketching for MUMmer and metagenomics
改进了 MUMmer 和宏基因组的基因组草图
  • 批准号:
    10670162
  • 财政年份:
    2022
  • 资助金额:
    $ 48.44万
  • 项目类别:
Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments
数据发现:搜索短读测序实验的计算方法
  • 批准号:
    9287168
  • 财政年份:
    2017
  • 资助金额:
    $ 48.44万
  • 项目类别:
Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments - Administrative Supplement
数据发现:搜索短读测序实验的计算方法 - 行政补充
  • 批准号:
    10393953
  • 财政年份:
    2017
  • 资助金额:
    $ 48.44万
  • 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
  • 批准号:
    8739540
  • 财政年份:
    2013
  • 资助金额:
    $ 48.44万
  • 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
  • 批准号:
    8579049
  • 财政年份:
    2013
  • 资助金额:
    $ 48.44万
  • 项目类别:
Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly
快速 k-mer 计数可量化基因表达并改善基因组组装
  • 批准号:
    8642468
  • 财政年份:
    2012
  • 资助金额:
    $ 48.44万
  • 项目类别:
Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly
快速 k-mer 计数可量化基因表达并改善基因组组装
  • 批准号:
    8518438
  • 财政年份:
    2012
  • 资助金额:
    $ 48.44万
  • 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
  • 批准号:
    8072578
  • 财政年份:
    2010
  • 资助金额:
    $ 48.44万
  • 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
  • 批准号:
    7772829
  • 财政年份:
    2010
  • 资助金额:
    $ 48.44万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 48.44万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 48.44万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了