权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research

开发先进算法来解决当前微生物组研究中的主要计算挑战

基本信息

批准号：
9270498
负责人：
Yijun Sun
金额：
$ 31.12万
依托单位：
STATE UNIVERSITY OF NEW YORK AT BUFFALO
依托单位国家：
美国
项目类别：
财政年份：
2016
资助国家：
美国
起止时间：
2016-05-15 至 2019-04-30
项目状态：
已结题

项目摘要

Abstract We propose a three-year interdisciplinary research plan to address two key issues currently facing the metagenomics community. The first issue concerns accurate construction and annotation of OTU tables using of millions of 16S rRNA sequences, which is one of the most important yet most difficult problems inmicrobiome data analysis. Currently, it lacks computational algorithms capable of handling extremely large sequence data and constructing biologically consistent OTU tables. We propose a novel method that performs OTU table construction and annotation simultaneously by utilizing input and reference sequences, reference annotations, and data clustering structure within one analytical framework. Dynamic data-driven cutoffs are derived to identify OTUs that are consistent not only with data clustering structure but also with reference annotations. When successfully implemented, our method will generally address the computational needs of processing hundreds of millions of 16S rRNA reads that are currently being generated by large-scale studies. The second issue concerns developing novel methods to extract pertinent information from massive sequence data, thereby facilitating the field shifting from descriptive research to mechanistic studies. We are particularly interested in microbial community dynamics analysis, which can provide a wealth of insight into disease development unattainable through a static experiment design, and lays a critical foundation for developing probiotic and antibiotic strategies to manipulate microbial communities. Traditionally, system dynamics is approached through time-course studies. However, due to economical and logistical constraints, time-course studies are generally limited by the number of samples examined and the time period followed. With the rapid development of sequencing technology, many thousands of samples are being collected in large-scale studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data, instead of time-course data, to study microbial community dynamics. To our knowledge, this is the first time that massive static data is used to study dynamic aspects of microbial communities. When successfully implemented, our approach can effectively overcome the sampling limitation of time-course studies, and opens a new avenue of research to study microbial dynamics underlying disease development without performing a resource-intensive time-course study. The proposed pipeline will be intensively tested on a large oral microbiome dataset consisting of ~2,600 subgingival samples (~330M reads). The analysis can significantly advance our understanding of dynamic behaviors of oral microbial communities possibly contributing to the development of periodontal disease. To our knowledge, no prior work has been performed on this scale to study oral microbial community dynamics. We have assembled a multidisciplinary team that covers expertise spanning the areas of machine learning, bioinformatics, and oral microbiology. The expected outcome of this work will be a set of computational tools of high utility for the microbiology community and beyond.

摘要我们提出了一个为期三年的跨学科研究计划，以解决目前面临的两个关键问题，宏基因组学社区。第一个问题涉及使用数百万个16S rRNA序列准确构建和注释OTU表，这是微生物组数据分析中最重要但最困难的问题之一。目前，它缺乏能够处理非常大的计算算法，序列数据和构建生物学上一致的OTU表。我们提出了一种新的方法，同时进行OTU表的建设和注释，利用输入和参考序列，参考注释，数据聚类结构在一个分析框架。导出动态数据驱动的截止值以识别不仅与数据聚类结构一致而且与参考注释一致的OTU。当成功实施时，我们的方法通常将解决处理目前由大规模研究产生的数亿个16S rRNA读段的计算需求。第二个问题涉及开发新的方法，从大量的序列数据中提取相关信息，从而促进该领域从描述性研究转向机制研究。我们特别感兴趣的是微生物群落动态分析，它可以提供丰富的洞察疾病的发展无法实现通过静态实验设计，并奠定了一个重要的基础，开发益生菌和抗生素策略来操纵微生物群落。传统上，系统动力学是通过时间过程研究。然而，由于经济和后勤方面的限制，时间进程研究通常受到所检查样品数量和所用时间的限制。随着测序技术的快速发展，在大规模研究中收集了数千个样品。这为我们提供了一个独特的机会，开发一种新的分析策略，使用静态数据，而不是时间过程数据，研究微生物群落动态。据我们所知，这是第一次使用大量静态数据来研究微生物群落的动态方面。当成功实施时，我们的方法可以有效地克服时间过程研究的采样限制，并开辟了一条新的研究途径，研究疾病发展的微生物动力学基础，而无需进行资源密集型的时间过程研究。拟议的管道将在一个大型口腔微生物组数据集上进行集中测试，该数据集由约2，600个龈下样本（约3.3亿次读取）组成。该分析可以显着促进我们对口腔微生物群落动态行为的了解，可能有助于牙周病的发展。据我们所知，没有先前的工作已经进行了这种规模的研究口腔微生物社区动态我们组建了一个多学科团队，涵盖机器学习，生物信息学和口腔微生物学领域的专业知识。这项工作的预期成果将是一套为微生物学界及其他领域提供高实用性的计算工具。