Collaborative Research: EAGER: Solving the bait learning problem for large-scale DNA enrichment

合作研究:EAGER:解决大规模 DNA 富集的诱饵学习问题

基本信息

  • 批准号:
    2118251
  • 负责人:
  • 金额:
    $ 15.9万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-01 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

The microbiome refers to all micro-organisms within a biological sample and has been linked to numerous biological activities and phenotypes. For example, the “gut microbiome”, which commonly refers to the bacteria found within the digestive system in humans, has been attributed to countless phenotypes and diseases, including obesity, Alzheimer’s disease, autism spectrum disorder, and types of cancer. Similarly, the soil microbiome has been attributed to drought tolerance, flowering time, and pesticide resistance in plants. Comprehensively studying the micro-organisms in a biological sample is challenging as that it is estimated that between 95% and 99% of them cannot live outside their natural environments, and therefore, cannot be isolated in a laboratory environment. Fortunately, shotgun metagenomics can address this challenge since it is able to take as input the DNA corresponding to all the micro-organisms within a sample and produce the DNA strings (which are called “sequence reads”) corresponding to them. These sequencer reads are then further analyzed to identify and study the micro-organisms. However, frequently scientists are interested in studying a limited number of micro-organisms rather than all of them. For example, in the case of studying the micro-organisms from a respiratory swab, it may be the case that only the sequence reads corresponding to COVID-19 are of interest. Shotgun metagenomics will produce sequence reads for all DNA found on the swab. Fortunately, there are laboratory methods capable of restricting the sequencing to only a selected set of DNA sequences, which is referred to as DNA enrichment. DNA enrichment creates and applies a set of short, synthetic DNA fragments (called “baits”) to a biological sample which then bind to only selected portions of the DNA. The unbound DNA is then rinsed away, leaving only the bound DNA that is then sequenced. Hence, the first step of this process is the informatics problem of identifying the baits that will enrich for a given set of DNA sequences. Two things that make the informatics problem challenging is that the number of baits should be of minimum size, and the baits do not only bind exactly to a single DNA sequence, but they can bind to many sequences with some allowable mismatches. This project will devise we will combine techniques in informatics, machine learning and data integration techniques to create practical methods for creating baits. While existing heuristics demonstrated the utility of DNA enrichment, there still does not exist any methods that are able to do this efficiently for large DNA databases. Hence, one of the focuses of this project is to create scalable methods. To accomplish this, we will develop methods for solving two different informatics problems: (1) the Bait Minimization problem that aims to identify the set of baits of minimal size that enrich for the entire set of DNA sequences, and (2) the DNA Sequence Maximization problem which aims to select the largest subset of DNA sequences that can be enriched by using a bait set of restricted size. Here, we will integrate the information of the biological process into the problem formulations and solve the problems using state-of-the-art informatic approaches. This project will result in several innovations that will have major impact in informatics, data science and other scientific disciplines. The broader impact of this work will encompass the furtherance of our knowledge micro-organisms and the creation of curriculum for Girls Engaged in Engineering Days at the University of Florida.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
微生物组是指生物样品中的所有微生物,并与许多生物活性和表型相关。 例如,“肠道微生物组”,通常是指在人类消化系统内发现的细菌,已被归因于无数的表型和疾病,包括肥胖症,阿尔茨海默病,自闭症谱系障碍和癌症类型。 同样,土壤微生物组也被归因于植物的耐旱性、开花时间和杀虫剂抗性。 全面研究生物样品中的微生物具有挑战性,因为据估计,95%至99%的微生物无法在其自然环境之外生存,因此无法在实验室环境中分离。 幸运的是,鸟枪法宏基因组学可以解决这一挑战,因为它能够将样本中所有微生物对应的DNA作为输入,并产生对应于它们的DNA串(称为“序列读数”)。 然后进一步分析这些测序仪读数以鉴定和研究微生物。 然而,科学家们通常只对有限数量的微生物感兴趣,而不是对所有微生物感兴趣。 例如,在研究来自呼吸道拭子的微生物的情况下,可能只有对应于COVID-19的序列读段是感兴趣的。 霰弹枪宏基因组学将产生拭子上发现的所有DNA的序列读数。 幸运的是,有一些实验室方法能够将测序限制在一组选定的DNA序列上,这被称为DNA富集。 DNA富集产生并将一组短的合成DNA片段(称为“诱饵”)应用于生物样品,然后仅结合DNA的选定部分。 然后冲洗掉未结合的DNA,仅留下结合的DNA,然后进行测序。 因此,该过程的第一步是识别将富集给定的一组DNA序列的诱饵的信息学问题。 使信息学问题具有挑战性的两件事是诱饵的数量应该是最小的,并且诱饵不仅可以精确地结合单个DNA序列,而且可以结合许多具有一些允许错配的序列。 这个项目将设计,我们将联合收割机技术在信息学,机器学习和数据集成技术,以创建实用的方法,创造诱饵。 虽然现有的化学分析证明了DNA富集的实用性,但仍然不存在任何能够有效地对大型DNA数据库进行富集的方法。 因此,该项目的重点之一是创建可扩展的方法。 为了实现这一点,我们将开发用于解决两个不同的信息学问题的方法:(1)诱饵最小化问题,其目的是识别富集整个DNA序列集的最小尺寸的诱饵集,以及(2)DNA序列最大化问题,其目的是选择可以通过使用限制尺寸的诱饵集来富集的DNA序列的最大子集。在这里,我们将把生物过程的信息整合到问题的表述中,并使用最先进的信息学方法来解决问题。该项目将产生几项创新,这些创新将对信息学,数据科学和其他科学学科产生重大影响。这项工作的更广泛的影响将包括促进我们的知识微生物和创建课程的女孩在工程天在佛罗里达大学。这个奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MEGARes and AMR++, v3.0: an updated comprehensive database of antimicrobial resistance determinants and an improved software pipeline for classification using high-throughput sequencing.
  • DOI:
    10.1093/nar/gkac1047
  • 发表时间:
    2023-01-06
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Bonin, Nathalie;Doster, Enrique;Worley, Hannah;Pinnell, Lee J.;Bravo, Jonathan E.;Ferm, Peter;Marini, Simone;Prosperi, Mattia;Noyes, Noelle;Morley, Paul S.;Boucher, Christina
  • 通讯作者:
    Boucher, Christina
Computing Maximal Unique Matches with the r-index
使用 r 索引计算最大唯一匹配
Syotti: scalable bait design for DNA enrichment.
Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance genes in metagenomes.
  • DOI:
    10.1186/s40168-022-01368-y
  • 发表时间:
    2022-11-02
  • 期刊:
  • 影响因子:
    15.5
  • 作者:
    Slizovskiy, Ilya B.;Oliva, Marco;Settle, Jonathen K.;Zyskina, Lidiya, V;Prosperi, Mattia;Boucher, Christina;Noyes, Noelle R.
  • 通讯作者:
    Noyes, Noelle R.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Christina Boucher其他文献

ONeSAMP 3.0: Effective Population Size via SNP Data for One Population Sample
ONeSAMP 3.0:通过一个群体样本的 SNP 数据获得有效群体规模
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aaron Hong;R. G. Cheek;Kingshuk Mukherjee;Isha Yooseph;Marco Oliva;Mark Heim;W. C. Funk;David Tallmon;Christina Boucher
  • 通讯作者:
    Christina Boucher
Data Structures for SMEM-Finding in the PBWT
PBWT 中 SMEM 查找的数据结构
  • DOI:
    10.1007/978-3-031-43980-3_8
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    Paola Bonizzoni;Christina Boucher;D. Cozzi;Travis Gagie;Dominik Köppl;Massimiliano Rossi
  • 通讯作者:
    Massimiliano Rossi
A study at the wildlife-livestock interface unveils the potential of feral swine as a reservoir for extended-spectrum β-lactamase-producing emEscherichia coli/em
一项针对野生动物与家畜交界地区的研究揭示了野猪作为产超广谱β-内酰胺酶大肠埃希菌宿主的潜力。
  • DOI:
    10.1016/j.jhazmat.2024.134694
  • 发表时间:
    2024-07-15
  • 期刊:
  • 影响因子:
    11.300
  • 作者:
    Ting Liu;Shinyoung Lee;Miju Kim;Peixin Fan;Raoul K. Boughton;Christina Boucher;Kwangcheol C. Jeong
  • 通讯作者:
    Kwangcheol C. Jeong
A comparative study of antibiotic resistance patterns in Mycobacterium tuberculosis
结核分枝杆菌抗生素耐药模式的比较研究
  • DOI:
    10.1038/s41598-025-89087-w
  • 发表时间:
    2025-02-11
  • 期刊:
  • 影响因子:
    3.900
  • 作者:
    Mohammadali Serajian;Conrad Testagrose;Mattia Prosperi;Christina Boucher
  • 通讯作者:
    Christina Boucher
Solving the Minimal Positional Substring Cover Problem in Sublinear Space
解决次线性空间中的最小位置子串覆盖问题

Christina Boucher的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Christina Boucher', 18)}}的其他基金

SCH: INT: Enabling real time surveillance of antimicrobial resistance
SCH:INT:实现抗菌药物耐药性的实时监测
  • 批准号:
    2013998
  • 财政年份:
    2021
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
IIBR Informatics: An Efficient Pangenomics Graph Aligner
IIBR 信息学:高效的泛基因组图对齐器
  • 批准号:
    2029552
  • 财政年份:
    2020
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: A Scalable and Efficient Optical Map Assembler
III:小型:协作研究:可扩展且高效的光学地图组装器
  • 批准号:
    1618814
  • 财政年份:
    2016
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: EAGER: The next crisis for coral reefs is how to study vanishing coral species; AUVs equipped with AI may be the only tool for the job
合作研究:EAGER:珊瑚礁的下一个危机是如何研究正在消失的珊瑚物种;
  • 批准号:
    2333604
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
  • 批准号:
    2347624
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: Revealing the Physical Mechanisms Underlying the Extraordinary Stability of Flying Insects
EAGER/合作研究:揭示飞行昆虫非凡稳定性的物理机制
  • 批准号:
    2344215
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345581
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345582
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
  • 批准号:
    2345583
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Energy for persistent sensing of carbon dioxide under near shore waves.
合作研究:EAGER:近岸波浪下持续感知二氧化碳的能量。
  • 批准号:
    2339062
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
  • 批准号:
    2409395
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: The next crisis for coral reefs is how to study vanishing coral species; AUVs equipped with AI may be the only tool for the job
合作研究:EAGER:珊瑚礁的下一个危机是如何研究正在消失的珊瑚物种;
  • 批准号:
    2333603
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
  • 批准号:
    2347623
  • 财政年份:
    2024
  • 资助金额:
    $ 15.9万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了