Phylogenetic Binning of Metagenomic Sequence Data

宏基因组序列数据的系统发育分箱

基本信息

项目摘要

DESCRIPTION (provided by applicant): Culture-independent metagenomic studies are essential for understanding our relationship with the organisms comprising the human microbiome, defining optimal microbial composition to maintain health, and devising selective treatment strategies to eliminate pathogens without harming beneficial species. To use metagenomic data effectively, raw DNA sequence data (reads) must be processed computationally (assembled) to obtain longer sequences (contigs). Existing software packages for this purpose are quite inefficient when presented with large, taxonomically diverse samples, resulting in considerable wastage of reads that cannot be assembled. Efforts to maximize assembly efficiency by relaxing stringency can lead to inappropriate joining of sequences from unrelated organisms (chimeric artifacts), compromising data accuracy and usefulness. Taxonomic binning of raw reads as a pre-filtering step is expected to improve metagenomic sequence assembly efficiency, reducing statistical noise due to sample complexity and allowing incorporation of raw reads into longer, more informative contigs without incurring chimeric artifacts. Benefits should be especially significant for less abundant species in complex mixtures. We have developed methods to quantify taxonomic binning program performance and assembly improvements in real metagenomic data sets, including reproducible calibration standards, to enable efficient parameter optimization for existing software and provide reliable benchmarks for future software development. Our specific aims are to 1) develop new computational methods for large-scale taxonomic classification of metagenomic sequence data, applicable to raw reads as well as assembled contigs; 2) develop software and protocols to use taxonomic data binning as a pre-treatment to increase efficiency of existing sequence assembly software; 3) benchmark performance enhancement for different assembly software programs using quantitative, statistical tests with both artificially created models and real-life metagenomic data sets of varying size and complexity; 4) make new computational methods and performance evaluation tools available to the general scientific community.
描述(由申请人提供):非培养物依赖性宏基因组研究对于理解我们与构成人类微生物组的生物体的关系,确定维持健康的最佳微生物组成,以及设计选择性治疗策略以消除病原体而不伤害有益物种至关重要。为了有效地使用宏基因组数据,必须对原始DNA序列数据(读段)进行计算处理(组装)以获得更长的序列(重叠群)。现有的用于此目的的软件包在呈现大的、分类学上不同的样品时是相当低效的,导致不能组装的读段的相当大的浪费。通过放松严格性来最大化组装效率的努力可能导致来自不相关生物体的序列的不适当连接(嵌合人工产物),从而损害数据准确性和有用性。作为预过滤步骤的原始读段的分类学分箱预计将提高宏基因组序列组装效率,减少由于样品复杂性引起的统计噪声,并允许将原始读段掺入更长、信息量更大的重叠群中,而不会引起嵌合伪影。对于复杂混合物中数量较少的物种,惠益应特别显著。我们已经开发了方法来量化分类分箱程序性能和组装改进的真实的宏基因组数据集,包括可重复的校准标准,使现有的软件有效的参数优化,并为未来的软件开发提供可靠的基准。我们的具体目标是:1)开发新的计算方法,用于宏基因组序列数据的大规模分类学分类,适用于原始读段以及组装的重叠群; 2)开发软件和协议,以使用分类学数据分箱作为预处理,以提高现有序列组装软件的效率; 3)使用定量的,对人工创建的模型和不同规模和复杂性的真实宏基因组数据集进行统计测试; 4)为一般科学界提供新的计算方法和性能评估工具。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eric Ellsworth Allen其他文献

Eric Ellsworth Allen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eric Ellsworth Allen', 18)}}的其他基金

Natural Sources and Microbial Transformation of Marine Halogenated Pollutants
海洋卤化污染物的天然来源和微生物转化
  • 批准号:
    10307709
  • 财政年份:
    2021
  • 资助金额:
    $ 19.31万
  • 项目类别:
Natural Sources and Microbial Transformation of Marine Halogenated Pollutants
海洋卤化污染物的天然来源和微生物转化
  • 批准号:
    10443787
  • 财政年份:
    2018
  • 资助金额:
    $ 19.31万
  • 项目类别:
Natural Sources and Microbial Transformation of Marine Halogenated Pollutants
海洋卤化污染物的天然来源和微生物转化
  • 批准号:
    10207635
  • 财政年份:
    2018
  • 资助金额:
    $ 19.31万
  • 项目类别:
Phylogenetic Binning of Metagenomic Sequence Data
宏基因组序列数据的系统发育分箱
  • 批准号:
    7708544
  • 财政年份:
    2009
  • 资助金额:
    $ 19.31万
  • 项目类别:

相似海外基金

DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 19.31万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了