Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research
开发先进算法来解决当前微生物组研究中的主要计算挑战
基本信息
- 批准号:9270498
- 负责人:
- 金额:$ 31.12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-05-15 至 2019-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAntibioticsAreaBig DataBioinformaticsBiologicalCommunitiesComputational algorithmComputer softwareDataData AnalysesData SetDevelopmentDiseaseEpidemiologyFloodsFoundationsHealthHumanHuman MicrobiomeHuman bodyInterdisciplinary StudyKnowledgeLogisticsMachine LearningMetagenomicsMethodsMicrobeMicrobiologyModelingOralOral MicrobiologyOutcomePeriodontal DiseasesPhysiological ProcessesPlayProbioticsResearchResourcesRibosomal RNARoleSamplingStructureSystemTaxonomyTechnologyTestingTimeWorkbasecohortcomputerized toolsdesigndynamic systemepidemiology studyexperimental studyinnovationinsightinterestmicrobialmicrobial communitymicrobiomemicrobiotamultidisciplinarynovelopen sourceoral behaviororal microbiomeresponsetumor progressionweb app
项目摘要
Abstract
We propose a three-year interdisciplinary research plan to address two key issues currently facing the
metagenomics community. The first issue concerns accurate construction and annotation of OTU tables using of millions of 16S rRNA sequences, which is one of the most important yet most difficult problems inmicrobiome data analysis. Currently, it lacks computational algorithms capable of handling extremely large
sequence data and constructing biologically consistent OTU tables. We propose a novel method that performs OTU table construction and annotation simultaneously by utilizing input and reference sequences, reference annotations, and data clustering structure within one analytical framework. Dynamic data-driven cutoffs are derived to identify OTUs that are consistent not only with data clustering structure but also with reference annotations. When successfully implemented, our method will generally address the computational needs of processing hundreds of millions of 16S rRNA reads that are currently being generated by large-scale studies. The second issue concerns developing novel methods to extract pertinent information from massive sequence data, thereby facilitating the field shifting from descriptive research to mechanistic studies. We are particularly interested in microbial community dynamics analysis, which can provide a wealth of insight into disease development unattainable through a static experiment design, and lays a critical foundation for developing probiotic and antibiotic strategies to manipulate microbial communities. Traditionally, system dynamics is approached through time-course studies. However, due to economical and logistical constraints, time-course studies are generally limited by the number of samples examined and the time period followed. With the rapid development of sequencing technology, many thousands of samples are being collected in large-scale studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data, instead of time-course data, to study microbial community dynamics. To our knowledge, this is the first time that massive static data is used to study dynamic aspects of microbial communities. When successfully implemented, our approach can effectively overcome the sampling limitation of time-course studies, and opens a new avenue of research to study microbial dynamics underlying disease development without performing a resource-intensive time-course study. The proposed pipeline will be intensively tested on a large oral microbiome dataset consisting of ~2,600 subgingival samples (~330M reads). The analysis can significantly advance our understanding of dynamic behaviors of oral microbial communities possibly contributing to the development of periodontal disease. To our knowledge, no prior work has been performed on this scale to study oral microbial
community dynamics. We have assembled a multidisciplinary team that covers expertise spanning the areas of machine learning, bioinformatics, and oral microbiology. The expected outcome of this work will be a set of
computational tools of high utility for the microbiology community and beyond.
摘要
我们提出了一个为期三年的跨学科研究计划,以解决目前面临的两个关键问题,
宏基因组学社区。第一个问题涉及使用数百万个16S rRNA序列准确构建和注释OTU表,这是微生物组数据分析中最重要但最困难的问题之一。目前,它缺乏能够处理非常大的计算算法,
序列数据和构建生物学上一致的OTU表。我们提出了一种新的方法,同时进行OTU表的建设和注释,利用输入和参考序列,参考注释,数据聚类结构在一个分析框架。导出动态数据驱动的截止值以识别不仅与数据聚类结构一致而且与参考注释一致的OTU。当成功实施时,我们的方法通常将解决处理目前由大规模研究产生的数亿个16S rRNA读段的计算需求。第二个问题涉及开发新的方法,从大量的序列数据中提取相关信息,从而促进该领域从描述性研究转向机制研究。我们特别感兴趣的是微生物群落动态分析,它可以提供丰富的洞察疾病的发展无法实现通过静态实验设计,并奠定了一个重要的基础,开发益生菌和抗生素策略来操纵微生物群落。传统上,系统动力学是通过时间过程研究。然而,由于经济和后勤方面的限制,时间进程研究通常受到所检查样品数量和所用时间的限制。随着测序技术的快速发展,在大规模研究中收集了数千个样品。这为我们提供了一个独特的机会,开发一种新的分析策略,使用静态数据,而不是时间过程数据,研究微生物群落动态。据我们所知,这是第一次使用大量静态数据来研究微生物群落的动态方面。当成功实施时,我们的方法可以有效地克服时间过程研究的采样限制,并开辟了一条新的研究途径,研究疾病发展的微生物动力学基础,而无需进行资源密集型的时间过程研究。拟议的管道将在一个大型口腔微生物组数据集上进行集中测试,该数据集由约2,600个龈下样本(约3.3亿次读取)组成。该分析可以显着促进我们对口腔微生物群落动态行为的了解,可能有助于牙周病的发展。据我们所知,没有先前的工作已经进行了这种规模的研究口腔微生物
社区动态我们组建了一个多学科团队,涵盖机器学习,生物信息学和口腔微生物学领域的专业知识。这项工作的预期成果将是一套
为微生物学界及其他领域提供高实用性的计算工具。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yijun Sun其他文献
Yijun Sun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 31.12万 - 项目类别:
Continuing Grant