CAREER: Scalable binning algorithms for genome-resolved metagenomics

职业:用于基因组解析宏基因组学的可扩展分箱算法

基本信息

  • 批准号:
    1845890
  • 负责人:
  • 金额:
    $ 118.86万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-07-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

Almost every environment on Earth is home to communities of microbes, whose metabolism profoundly influences all other life. Understanding such communities is therefore important to many fields, such as agriculture, biogeochemistry, oceanography, biology, ecology, and so on. However, the overwhelming majority of environmental microbes have never been isolated and grown in the laboratory. Therefore, a key to understanding microbial communities is the resolution of genomes from culture-independent sequencing (metagenomics), where the genomes of individual species are reconstructed from a mixed metagenome in a process called "binning". However, inaccurate binning hampers the ability to gain fundamental insights into microbial communities, for example in situations where it is uncertain whether all of a particular genome has been recovered from the dataset, or that all sequences in a bin are really from the same genome. In this regard, existing binning methods have various problems, including (1) poor performance with highly complex metagenomes, (2) assumption that all input sequences are bacterial, leading to impure bins, (3) lack of internal validation that bins make biological sense, and (4) ignoring the presence of multiple closely related strains and genome variability within a species (the "pangenome" concept). The goals of this project are to develop a binning method that can handle the novelty and complexity of even the most diverse microbial communities on Earth, such as soil. The methods developed have the potential to transform the study of microbial communities, enabling the determination of "who is doing what?" even in highly complex communities associated with higher organisms with unknown genomes. With increased understanding of microbial communities through genome-resolved metagenomics, it will eventually be possible to model and predict complex emergent behavior of communities and delineate how their capabilities are different from the component organisms in isolation. This project also aims to address the current shortfall of STEM graduates in the United States, many of whom initially express interest in STEM fields but then are lost to other majors. Research experiences have been shown to help increase persistent interest in STEM, but they often reach only a few students, and do not necessarily allow all students equitable access. This project will establish a course-based undergraduate research experience (CURE) in the analysis of metagenomics data, aimed at undeclared majors, to address these problems.In developing improved binning methods, the research will focus on (1) developing algorithms to leverage taxonomy information and universally-conserved marker genes for binning both well-characterized and divergent species in single samples, and (2) developing pangenome-aware algorithms to leverage co-occurrence of the same species in multiple samples for binning. This approach will maximize the information that will be gleaned from single samples, while avoiding assumptions on genome conservation within and between samples when leveraging data from multiple samples. The developed methods will be validated using simulated metagenomes as well as real data from soil. Specifically, soil communities will be followed longitudinally after forest fires, where most microbes are killed and then diversity slowly increases to the baseline. In addition to validating the performance of binning methods, these data will be used to investigate various hypotheses on why soil is able to maintain such high levels of microbial diversity. The CURE course which will be designed and implemented during this project will aim to develop skills in "big data" analysis that are becoming increasingly important in all branches of science and expose students to the analysis of shotgun metagenome data. The course will synergize with the highly popular Tiny Earth course, which is currently offered to nearly 10,000 students in 41 US states and 14 countries. The Tiny Earth network will act as a central deposit for data that can be used by institutions that lack resources for shotgun sequencing. Over time this data repository will facilitate the exploration of "big questions", thus crowdsourcing student-led discovery. For further information, visit http://jason-c-kwan.github.io/CAREER_results.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
地球上几乎每一个环境都是微生物群落的家园,它们的新陈代谢深刻地影响着所有其他生命。因此,了解这类群落对农业、生物地球化学、海洋学、生物学、生态学等许多领域都具有重要意义。然而,绝大多数环境微生物从未在实验室中被分离和培养。因此,了解微生物群落的一个关键是从独立于培养的测序(元基因组学)中解析基因组,其中单个物种的基因组是在一个称为“入库”的过程中从混合的元基因组中重建的。然而,不准确的入库阻碍了对微生物群落的基本了解,例如,在不确定是否已从数据集中恢复所有特定基因组的情况下,或者不确定一个库中的所有序列确实来自同一基因组的情况下。在这方面,现有的入库方法存在各种问题,包括(1)在高度复杂的元基因组的情况下表现不佳,(2)假设所有输入序列都是细菌的,导致不纯的入库,(3)缺乏对入库具有生物学意义的内部验证,以及(4)忽略了多个密切相关菌株的存在和一个物种内的基因组变异性(“Pangenome”概念)。该项目的目标是开发一种可以处理即使是地球上最多样化的微生物群落(如土壤)的新颖性和复杂性的装箱方法。开发的方法有可能改变微生物群落的研究,从而能够确定“谁在做什么?”即使在与具有未知基因组的高等生物相关的高度复杂的群落中也是如此。随着通过基因组解析的元基因组学对微生物群落的了解增加,最终将有可能对群落的复杂涌现行为进行建模和预测,并描绘出它们的能力与分离的组成生物的不同之处。该项目还旨在解决目前美国STEM毕业生短缺的问题,他们中的许多人最初对STEM领域表示感兴趣,但后来被其他专业迷失。研究经验已被证明有助于提高对STEM的持续兴趣,但它们往往只接触到少数学生,并不一定允许所有学生平等地获得机会。这个项目将建立基于课程的本科生在元基因组数据分析方面的研究经验(CURE),针对未申报的专业,以解决这些问题。在开发改进的入库方法时,研究重点将集中在(1)开发算法来利用分类学信息和普遍保守的标记基因来将特征良好的和不同的物种在单个样本中进行入库,以及(2)开发泛基因组感知算法来利用同一物种在多个样本中的共现来进行入库。这种方法将最大限度地利用从单个样本收集的信息,同时在利用来自多个样本的数据时避免假设样本内部和样本之间的基因组保守。所开发的方法将使用模拟的元基因组以及来自土壤的真实数据进行验证。具体地说,在森林火灾后,将纵向跟踪土壤群落,在那里大多数微生物被杀死,然后多样性慢慢增加到基线水平。除了验证仓储方法的性能外,这些数据还将被用来调查关于土壤为什么能够保持如此高水平的微生物多样性的各种假说。将在该项目期间设计和实施的CURE课程旨在培养在各个科学分支中变得越来越重要的“大数据”分析技能,并让学生接触到对元基因组数据的分析。该课程将与广受欢迎的微小地球课程相配合,该课程目前为美国41个州和14个国家的近1万名学生提供。微小的地球网络将作为数据的中央储存库,供缺乏资源进行鸟枪测序的机构使用。随着时间的推移,这个数据仓库将有助于探索“大问题”,从而将学生主导的发现众包。关于更多信息,访问http://jason-c-kwan.github.io/CAREER_results.This奖反映了国家科学基金会的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Structural Dynamics and Molecular Evolution of the SARS-CoV-2 Spike Protein.
  • DOI:
    10.1128/mbio.02030-21
  • 发表时间:
    2022-04-26
  • 期刊:
  • 影响因子:
    6.4
  • 作者:
  • 通讯作者:
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jason Kwan其他文献

Jason Kwan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

Scalable indoor power harvesters using halide perovskites
使用卤化物钙钛矿的可扩展室内能量收集器
  • 批准号:
    MR/Y011686/1
  • 财政年份:
    2025
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Fellowship
RestoreDNA: Development of scalable eDNA-based solutions for biodiversity regulators and nature-related disclosure
RestoreDNA:为生物多样性监管机构和自然相关披露开发可扩展的基于 eDNA 的解决方案
  • 批准号:
    10086990
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Collaborative R&D
Scalable and Automated Tuning of Spin-based Quantum Computer Architectures
基于自旋的量子计算机架构的可扩展和自动调整
  • 批准号:
    2887634
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Studentship
DREAM Sentinels: Multiplexable and programmable cell-free ADAR-mediated RNA sensing platform (cfRADAR) for quick and scalable response to emergent viral threats
DREAM Sentinels:可复用且可编程的无细胞 ADAR 介导的 RNA 传感平台 (cfRADAR),可快速、可扩展地响应突发病毒威胁
  • 批准号:
    2319913
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
Collaborative Research: Scalable Nanomanufacturing of Perovskite-Analogue Nanocrystals via Continuous Flow Reactors
合作研究:通过连续流反应器进行钙钛矿类似物纳米晶体的可扩展纳米制造
  • 批准号:
    2315997
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
CAREER: Scalable Physics-Inspired Ising Computing for Combinatorial Optimizations
职业:用于组合优化的可扩展物理启发伊辛计算
  • 批准号:
    2340453
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Continuing Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
  • 批准号:
    2412357
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
SHF: Small: QED - A New Approach to Scalable Verification of Hardware Memory Consistency
SHF:小型:QED - 硬件内存一致性可扩展验证的新方法
  • 批准号:
    2332891
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
SBIR Phase I: Scalable Magnetically-Geared Modular Space Manipulator for In-space Manufacturing and Active Debris Remediation Missions
SBIR 第一阶段:用于太空制造和主动碎片修复任务的可扩展磁力齿轮模块化空间操纵器
  • 批准号:
    2335583
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
CC* Networking Infrastructure: Building a Scalable and Polymorphic Cyberinfrastructure for Diverse Research and Education Needs at Illinois State University
CC* 网络基础设施:为伊利诺伊州立大学的多样化研究和教育需求构建可扩展和多态的网络基础设施
  • 批准号:
    2346712
  • 财政年份:
    2024
  • 资助金额:
    $ 118.86万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了