A Desktop Assembly and Analysis Pipeline for Next-gen Metagenomic Sequencing

用于下一代宏基因组测序的桌面组装和分析流程

基本信息

  • 批准号:
    8200467
  • 负责人:
  • 金额:
    $ 15.29万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-08-05 至 2013-01-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Over the last decade, bacterial and archaeal communities have been identified in virtually every ecological niche examined, from acid mine drainage to tropospheric clouds to the human body. Understanding the make-up and interactions within these communities is crucial in understanding how each ecosystem functions. Metagenomics (or community genomics) is the approach whereby the collective genomic content of a naturally occurring microbial community can be determined without the need to isolate and culture its constituents independently. When assembled, the genetic framework of the community, including critical information on population structure, phylogenetic diversity, as well as novel genetic and biochemical activities can be obtained. The potential impact of such findings on biotechnology, medicine and ecology are enormous. Recently, next-gen sequencing technologies (e.g. Roche/454, Illumina, and Life Technologies (SOLiD)) have replaced traditional Sanger sequencing for metagenomic data generation. These are cost effective, clone-free, massively parallel technologies capable of producing as many as 250 gigabases of data in a single machine run. That level of sequencing makes it feasible to reconstruct even low abundance genomes as well as determine intraspecies heterogeneity from a community sample. However, the next-gen technologies also present their own computational challenges including the sheer volume of data to be processed together with the different read lengths, error models and formats unique to each technology. These complexities together with those posed the communities themselves together has left researchers to cobble together various combinations of software tools to process and analyze their data. Most software that can handle these large, complex data sets also require substantial computing resources and computer expertise beyond that of a normally equipped lab. These difficulties continue to have a serious stifling effect on the advances in science and technology that metagenomics offers. The long term goal of this proposal is to develop a seamless, commercial-grade metagenomic sequence assembly and analysis pipeline that is fully scalable to any size project. The proposed software will be easy to use and run on a desktop computer costing less than $5000 so that any reasonably funded laboratory or clinic can exploit metagenomic technology. Toward that goal, this Phase I proposal focuses on a solution to the central task of processing massive next-gen metagenomic data sets on a desktop computer. We will evaluate whether our new non-memory bound assembly engine, XNG, can meet the challenges in three crucial steps: 1) removing reads derived from contaminating host DNA, 2) "recruiting" the potentially hundreds of millions remaining reads into appropriate phylogenetic bins based on matches to a local reference genome database, and 3) converting genome sequences from multiple strains of a given species into a single annotated entry (the "pan-genome") for enhanced read recruitment and downstream annotation. PUBLIC HEALTH RELEVANCE: Human health and medicine are greatly influenced by the microbial communities that are integral parts of our bodies as well as those that produce useful antibiotics for example. To better exploit the potential of these communities, metagenomic studies are producing vast amounts of Next-gen DNA sequence data to decipher their collective genetic content. This project focuses on developing computer software capable of reconstructing microbial community genomes and analyzing their content.
描述(由申请人提供):在过去的十年中,细菌和古细菌群落已经在几乎每一个被检查的生态位中被确定,从酸性矿井排水到对流层云再到人体。了解这些群落的组成和相互作用对于了解每个生态系统的功能至关重要。宏基因组学(或群落基因组学)是一种方法,通过这种方法可以确定自然存在的微生物群落的集体基因组内容,而无需独立分离和培养其组成部分。组装后,可以获得群落的遗传框架,包括种群结构、系统发育多样性以及新的遗传和生化活动的关键信息。这些发现对生物技术、医学和生态学的潜在影响是巨大的。最近,新一代测序技术(如罗氏/454,Illumina和Life technologies (SOLiD))已经取代了传统的Sanger测序,用于宏基因组数据生成。这些都是低成本、无克隆、大规模并行的技术,能够在一台机器运行中产生多达250千兆字节的数据。这种测序水平使得重建低丰度基因组以及从群落样本中确定种内异质性成为可能。然而,下一代技术也提出了自己的计算挑战,包括需要处理的数据量以及每种技术特有的不同读取长度、错误模型和格式。这些复杂性加上社区本身的复杂性,使得研究人员不得不拼凑各种软件工具组合来处理和分析他们的数据。大多数能够处理这些大型复杂数据集的软件也需要大量的计算资源和计算机专业知识,超出了通常配备的实验室的能力。这些困难继续对宏基因组学提供的科学和技术进步产生严重的抑制作用。该提案的长期目标是开发一个无缝的,商业级的宏基因组序列组装和分析管道,可以完全扩展到任何规模的项目。拟议中的软件将易于使用,并在台式计算机上运行,成本低于5000美元,因此任何合理资助的实验室或诊所都可以利用宏基因组技术。为了实现这一目标,第一阶段的提案侧重于在台式计算机上处理大量下一代宏基因组数据集的中心任务的解决方案。我们将通过三个关键步骤来评估我们新的非内存绑定组装引擎XNG是否能够应对挑战:1)去除来自污染宿主DNA的reads, 2)根据与本地参考基因组数据库的匹配,将潜在的数亿个剩余reads“招募”到适当的系统发育箱中,3)将给定物种的多个菌株的基因组序列转换为单个注释条目(“泛基因组”),以增强reads招募和下游注释。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

TIMOTHY J DURFEE其他文献

TIMOTHY J DURFEE的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('TIMOTHY J DURFEE', 18)}}的其他基金

Long read based sequencing software for the comprehensive analysis of clinical samples
基于长读长的测序软件,用于临床样本的综合分析
  • 批准号:
    10009727
  • 财政年份:
    2020
  • 资助金额:
    $ 15.29万
  • 项目类别:
Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
  • 批准号:
    9883809
  • 财政年份:
    2018
  • 资助金额:
    $ 15.29万
  • 项目类别:
Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
  • 批准号:
    9767335
  • 财政年份:
    2018
  • 资助金额:
    $ 15.29万
  • 项目类别:
Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
  • 批准号:
    9255092
  • 财政年份:
    2017
  • 资助金额:
    $ 15.29万
  • 项目类别:
Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
  • 批准号:
    9747613
  • 财政年份:
    2017
  • 资助金额:
    $ 15.29万
  • 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
  • 批准号:
    8236680
  • 财政年份:
    2012
  • 资助金额:
    $ 15.29万
  • 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
  • 批准号:
    8727829
  • 财政年份:
    2012
  • 资助金额:
    $ 15.29万
  • 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
  • 批准号:
    8703156
  • 财政年份:
    2012
  • 资助金额:
    $ 15.29万
  • 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
  • 批准号:
    8624982
  • 财政年份:
    2012
  • 资助金额:
    $ 15.29万
  • 项目类别:
Integrated Assembly Software for Sanger and Next Generation Sequence Technologies
适用于 Sanger 和下一代序列技术的集成装配软件
  • 批准号:
    8011298
  • 财政年份:
    2007
  • 资助金额:
    $ 15.29万
  • 项目类别:

相似海外基金

Can antibiotics disrupt biogeochemical nitrogen cycling in the coastal ocean?
抗生素会破坏沿海海洋的生物地球化学氮循环吗?
  • 批准号:
    2902098
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Studentship
Metallo-Peptides: Arming Cyclic Peptide Antibiotics with New Weapons to Combat Antimicrobial Resistance
金属肽:用新武器武装环肽抗生素以对抗抗菌素耐药性
  • 批准号:
    EP/Z533026/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Research Grant
The role of RNA repair in bacterial responses to translation-inhibiting antibiotics
RNA修复在细菌对翻译抑制抗生素的反应中的作用
  • 批准号:
    BB/Y004035/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Research Grant
DYNBIOTICS - Understanding the dynamics of antibiotics transport in individual bacteria
DYNBIOTICS - 了解抗生素在单个细菌中转运的动态
  • 批准号:
    EP/Y023528/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Research Grant
Towards the sustainable discovery and development of new antibiotics
迈向新抗生素的可持续发现和开发
  • 批准号:
    FT230100468
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    ARC Future Fellowships
Engineering Streptomyces bacteria for the sustainable manufacture of antibiotics
工程化链霉菌用于抗生素的可持续生产
  • 批准号:
    BB/Y007611/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Research Grant
The disulfide bond as a chemical tool in cyclic peptide antibiotics: engineering disulfide polymyxins and murepavadin
二硫键作为环肽抗生素的化学工具:工程化二硫多粘菌素和 murepavadin
  • 批准号:
    MR/Y033809/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Research Grant
Role of phenotypic heterogeneity in mycobacterial persistence to antibiotics: Prospects for more effective treatment regimens
表型异质性在分枝杆菌对抗生素持久性中的作用:更有效治疗方案的前景
  • 批准号:
    494853
  • 财政年份:
    2023
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Operating Grants
Imbalance between cell biomass production and envelope biosynthesis underpins the bactericidal activity of cell wall -targeting antibiotics
细胞生物量产生和包膜生物合成之间的不平衡是细胞壁靶向抗生素杀菌活性的基础
  • 批准号:
    2884862
  • 财政年份:
    2023
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Studentship
Narrow spectrum antibiotics for the prevention and treatment of soft-rot plant disease
防治植物软腐病的窄谱抗生素
  • 批准号:
    2904356
  • 财政年份:
    2023
  • 资助金额:
    $ 15.29万
  • 项目类别:
    Studentship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了