MIM: Machine Learning, Systems Modeling, and Experimental Approaches to Understand the Universal Rules of Life of Microbiota Using Marine Time Series Data
MIM:利用海洋时间序列数据了解微生物群生命普遍规则的机器学习、系统建模和实验方法
基本信息
- 批准号:2125142
- 负责人:
- 金额:$ 250.07万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-01-01 至 2026-12-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Understanding the relationships among microbial organisms, the functioning of their genes and communities, the environment, and how these relationships reflect universal rules of life remain essential problems in microbiology. The team of investigators are leveraging metagenomics data, the collective DNA content of the entire community, from a marine time series to address these questions. Although the science of metagenomics is over a decade old, there are still many aspects of metagenomic datasets that require new approaches to extract valuable information. The research team will apply state-of-the-art integrative machine learning, systems modeling and experimental approaches to existing and newly generated time series metagenomic data to better understand the interaction networks in microbial communities and their impacts on microbial community function, which have major implications for understanding the global cycling of elements and processing of energy in ecosystems. The machine learning and mathematical modeling tools developed in this proposal should provide new avenues for fundamental analysis of metagenomes. The theory and computational tools will also directly benefit both the statistical and machine learning community on causal inference as well as ecological modeling. Ultimately, these tools will enable investigators to help uncover the universal rules of life within microbiomes from many different environments, including those present in animals and plants. The project will provide interdisciplinary training for postdoctoral fellows, graduate, undergraduate and high school students with emphasis on underrepresented groups in data science, computer science, statistics, computational biology, environmental biology and ecology. Software tools developed during the project will be disseminated to the community.Over the past two decades, the San Pedro Ocean Time (SPOT) Series associated with University of Southern California Microbial Observatory has collected time series marker gene, metagenomic, and metatranscriptomic data at different time scales (daily, weekly, monthly, and seasonally) across various depths, locations and perturbations (pristine and polluted) in the ocean. With the rich available time series data, the research team will develop machine learning, systems modeling, and experimental approaches to understand the universal rules of life of microbial communities. The specific aims of this project are to (1) develop machine learning approaches to identify all microbes, known or novel, within the microbial communities and also host of mobile genetic elements, such as viruses and plasmids, through metagenomic read assembly and binning, (2) further investigate the Granger graphical models with knockoff false discovery control, apply the resulting computational tools to the SPOT data to identify causal relationships among the known microbial genomes, metagenome assembled genomes, and environmental factors. (3) based on the causal networks constructed from the first two aims, develop mechanistic models driving organism abundances and community structure, such as competition, cross-feeding, virus-host interactions, grazing and physical transport, and develop a predictive framework for application to diverse and future ecosystems. (4) experimentally validate the predicted virus-host interactions using proximity-ligation experiments and the dynamics and emerging properties of the microbial communities. User-friendly software packages to automate the procedures for analyzing metagenomic data will be developed. Co-funding for this research was provided by the Biological Oceanography and Mathematical Biology programs.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
了解微生物之间的关系,它们的基因和群落的功能,环境,以及这些关系如何反映普遍的生命规则,仍然是微生物学的基本问题。调查团队正在利用元基因组学数据,即来自海洋时间序列的整个社区的集体DNA内容来解决这些问题。尽管元基因组学已经有十多年的历史了,但元基因组数据集的许多方面仍然需要新的方法来提取有价值的信息。研究小组将对现有的和新产生的时间序列元基因组数据应用最先进的综合机器学习、系统建模和实验方法,以更好地了解微生物群落中的相互作用网络及其对微生物群落功能的影响,这对理解生态系统中元素的全球循环和能量的处理具有重要意义。在这项提议中开发的机器学习和数学建模工具应该为元基因组的基本分析提供新的途径。这些理论和计算工具还将使统计和机器学习界在因果推理和生态建模方面直接受益。最终,这些工具将使研究人员能够帮助揭示来自许多不同环境的微生物群中的普遍生命规律,包括那些存在于动植物中的微生物群。该项目将为博士后研究员、研究生、本科生和高中生提供跨学科培训,重点是数据科学、计算机科学、统计学、计算生物学、环境生物学和生态学中代表性不足的群体。在过去的二十年里,与南加州大学微生物观测站合作的圣佩德罗海洋时间(SPOT)系列收集了不同时间尺度(每日、每周、每月和季节性)的时间序列标记基因、元基因组和元转录数据,涉及海洋中的不同深度、位置和扰动(原始的和受污染的)。利用丰富的时间序列数据,研究小组将开发机器学习、系统建模和实验方法,以了解微生物群落生命的普遍规律。这个项目的具体目标是(1)开发机器学习方法来识别微生物群落中的所有已知或新的微生物,以及通过元基因组阅读组装和入库的可移动遗传元素,如病毒和质粒,(2)进一步研究具有假冒错误发现控制的Granger图形模型,将所产生的计算工具应用于SPOT数据,以识别已知微生物基因组、组装后的超基因组和环境因素之间的因果关系。(3)基于前两个目标构建的因果网络,开发驱动生物丰度和群落结构的机制模型,如竞争、交叉摄食、病毒-宿主相互作用、放牧和物理运输,并开发一个可应用于不同和未来生态系统的预测框架。(4)利用邻近连接实验和微生物群落的动态和新特性,对预测的病毒-宿主相互作用进行了实验验证。将开发用户友好的软件包,使分析元基因组数据的程序自动化。这项研究的共同资金由生物海洋学和数学生物学项目提供。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
HiFine: integrating Hi-C-based and shotgun-based methods to refine binning of metagenomic contigs
- DOI:10.1093/bioinformatics/btac295
- 发表时间:2022-05-12
- 期刊:
- 影响因子:5.8
- 作者:Du, Yuxuan;Sun, Fengzhu
- 通讯作者:Sun, Fengzhu
Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression
- DOI:10.1089/cmb.2021.0439
- 发表时间:2022-01-12
- 期刊:
- 影响因子:1.7
- 作者:Du, Yuxuan;Laperriere, Sarah M.;Sun, Fengzhu
- 通讯作者:Sun, Fengzhu
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Fengzhu Sun其他文献
HiCzin: Normalizing metagenomic Hi-C data and detecting spurious contacts using zero-inflated negative binomial regression
HiCzin:使用零膨胀负二项式回归标准化宏基因组 Hi-C 数据并检测虚假接触
- DOI:
10.1101/2021.03.01.433489 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Yuxuan Du;S. Laperriere;J. Fuhrman;Fengzhu Sun - 通讯作者:
Fengzhu Sun
On the use of population-based registries in the clinical validation of genetic tests for disease susceptibility
基于人群的登记在疾病易感性基因检测临床验证中的应用
- DOI:
10.1097/00125817-200005000-00005 - 发表时间:
1999 - 期刊:
- 影响因子:8.8
- 作者:
Quanhe Yang;M. Khoury;S. Coughlin;Fengzhu Sun;Dana Flanders - 通讯作者:
Dana Flanders
Bidirectional subsethood of shared marker profiles enables accurate virus classification
- DOI:
10.1186/s40168-025-02159-x - 发表时间:
2025-07-24 - 期刊:
- 影响因子:12.700
- 作者:
Christopher Riccardi;Yuqiu Wang;Shibu Yooseph;Fengzhu Sun - 通讯作者:
Fengzhu Sun
Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity
异质性下宏基因组交叉研究表型预测不同归一化方法的有效性比较
- DOI:
10.1038/s41598-024-57670-2 - 发表时间:
2023 - 期刊:
- 影响因子:4.6
- 作者:
Beibei Wang;Fengzhu Sun;Y. Luan - 通讯作者:
Y. Luan
Microsatellite mutations during the polymerase chain reaction: mean field approximations and their applications.
聚合酶链式反应过程中的微卫星突变:平均场近似及其应用。
- DOI:
10.1016/s0022-5193(03)00155-3 - 发表时间:
2003 - 期刊:
- 影响因子:2
- 作者:
Yinglei Lai;Fengzhu Sun - 通讯作者:
Fengzhu Sun
Fengzhu Sun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Fengzhu Sun', 18)}}的其他基金
Inference of Markovian Properties of Molecular Sequences Using Shotgun Reads and Applications
使用鸟枪读取和应用推断分子序列的马尔可夫性质
- 批准号:
1518001 - 财政年份:2015
- 资助金额:
$ 250.07万 - 项目类别:
Continuing Grant
Computational and Mathematical Study in Protein Interactions and Functions
蛋白质相互作用和功能的计算和数学研究
- 批准号:
0241102 - 财政年份:2003
- 资助金额:
$ 250.07万 - 项目类别:
Continuing Grant
相似国自然基金
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Continuing Grant
RII Track-4:NSF: Physics-Informed Machine Learning with Organ-on-a-Chip Data for an In-Depth Understanding of Disease Progression and Drug Delivery Dynamics
RII Track-4:NSF:利用器官芯片数据进行物理信息机器学习,深入了解疾病进展和药物输送动力学
- 批准号:
2327473 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
CC* Campus Compute: UTEP Cyberinfrastructure for Scientific and Machine Learning Applications
CC* 校园计算:用于科学和机器学习应用的 UTEP 网络基础设施
- 批准号:
2346717 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
Learning to create Intelligent Solutions with Machine Learning and Computer Vision: A Pathway to AI Careers for Diverse High School Students
学习利用机器学习和计算机视觉创建智能解决方案:多元化高中生的人工智能职业之路
- 批准号:
2342574 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
Collaborative Research: Conference: DESC: Type III: Eco Edge - Advancing Sustainable Machine Learning at the Edge
协作研究:会议:DESC:类型 III:生态边缘 - 推进边缘的可持续机器学习
- 批准号:
2342498 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
Excellence in Research:Towards Data and Machine Learning Fairness in Smart Mobility
卓越研究:实现智能移动中的数据和机器学习公平
- 批准号:
2401655 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
I-Corps: Translation potential of using machine learning to predict oxaliplatin chemotherapy benefit in early colon cancer
I-Corps:利用机器学习预测奥沙利铂化疗对早期结肠癌疗效的转化潜力
- 批准号:
2425300 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
CAREER: Mitigating the Lack of Labeled Training Data in Machine Learning Based on Multi-level Optimization
职业:基于多级优化缓解机器学习中标记训练数据的缺乏
- 批准号:
2339216 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Continuing Grant
Postdoctoral Fellowship: OPP-PRF: Leveraging Community Structure Data and Machine Learning Techniques to Improve Microbial Functional Diversity in an Arctic Ocean Ecosystem Model
博士后奖学金:OPP-PRF:利用群落结构数据和机器学习技术改善北冰洋生态系统模型中的微生物功能多样性
- 批准号:
2317681 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Standard Grant
Accelerated discovery of ultra-fast ionic conductors with machine learning
通过机器学习加速超快离子导体的发现
- 批准号:
24K08582 - 财政年份:2024
- 资助金额:
$ 250.07万 - 项目类别:
Grant-in-Aid for Scientific Research (C)