Project Harvester: Improving molecular fingerprint prediction through self-training
Project Harvester:通过自我训练改进分子指纹预测
基本信息
- 批准号:518231245
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:
- 资助国家:德国
- 起止时间:
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Rapid annotation of small molecules is of interest in numerous areas of biology and the life sciences. Mass spectrometry (MS) is a key technology for the annotation of small molecules from small amounts of samples. Structural elucidation of small molecules is usually carried out using tandem mass spectrometry (MS/MS). Computational analysis of MS/MS data is one of the major technological hurdles in metabolomics and small molecule research today. In 2015, my group developed CSI:FingerID for searching MS/MS data in molecular structure databases. Later, we developed CANOPUS of the comprehensive assignment of compound classes without the need for structural elucidation. In 2021, we published the COSMIC workflow that allows us to differentiate between correct and incorrect annotations. All of these methods depend on MS/MS data to train the underlying machine learning models. Unfortunately, available reference MS/MS libraries are growing slowly, and much slower than structure databases or publicly available biological data. The fundamental objective of this project is to harness the publicly available biological data to improve our machine learning models. The prediction of molecular fingerprints from MS/MS data of small molecules lies at the heart of many computational methods such as CSI:FingerID, CANOPUS and MSNovelist. The goal of this project is to substantially improve fingerprint prediction performance through self-training, making use of the billions of unlabeled spectra from small molecules available in public repositories. We will process hundreds of thousands of LC-MS/MS runs publicly available in repositories such as GNPS, find high-confidence annotations, feed those annotated MS/MS spectra back into the training data for fingerprint prediction, and repeat until convergence. The impact of our project will be two-fold. Firstly, we can improve the performance of all methods that rely on fingerprint prediction, including CSI:FingerID, CANOPUS and MSNovelist. Second, our project will provide us with a large public library of MS/MS with putative molecular structure annotations. This will not only allow others to train better machine learning models (say, for Competitive Fragmentation Modeling, CFM) but also be of value for computational method development in general.
小分子的快速注释在生物学和生命科学的许多领域中是令人感兴趣的。 质谱(MS)是从少量样品中注释小分子的关键技术。小分子的结构解析通常使用串联质谱法(MS/MS)进行。MS/MS数据的计算分析是当今代谢组学和小分子研究的主要技术障碍之一。 2015年,我的团队开发了CSI:FingerID,用于在分子结构数据库中搜索MS/MS数据。后来,我们开发了CANOPUS的化合物类的综合分配,而无需结构说明。2021年,我们发布了COSMIC工作流程,让我们能够区分正确和不正确的注释。所有这些方法都依赖于MS/MS数据来训练底层机器学习模型。不幸的是,可用的参考MS/MS库增长缓慢,并且比结构数据库或公开可用的生物数据慢得多。该项目的基本目标是利用公开的生物数据来改进我们的机器学习模型。从小分子的MS/MS数据预测分子指纹是许多计算方法的核心,例如CSI:FingerID,CANOPUS和MSNovelist。该项目的目标是通过自我训练来大幅提高指纹预测性能,利用公共存储库中数十亿个来自小分子的未标记光谱。我们将处理GNPS等存储库中公开提供的数十万次LC-MS/MS运行,找到高置信度注释,将这些注释的MS/MS光谱反馈到训练数据中进行指纹预测,并重复直到收敛。我们项目的影响将是双重的。 首先,我们可以提高所有依赖指纹预测的方法的性能,包括CSI:FingerID,CANOPUS和MSNovelist。第二,我们的项目将为我们提供一个大型的MS/MS公共图书馆,并提供推定的分子结构注释。这不仅允许其他人训练更好的机器学习模型(例如,竞争碎片建模,CFM),而且对一般的计算方法开发也有价值。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Sebastian Böcker其他文献
Professor Dr. Sebastian Böcker的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Sebastian Böcker', 18)}}的其他基金
Transferable retention time prediction for Liquid Chromatography-Mass Spectrometry-based metabolomics
基于液相色谱-质谱的代谢组学的可转移保留时间预测
- 批准号:
425789784 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Research Grants
Identifying the unknowns: towards structural elucidation of small molecules using mass spectrometry
识别未知数:利用质谱法阐明小分子的结构
- 批准号:
242259350 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Research Grants
FlipCut Supertrees: Große und akkurate Phylogenien schneller bestimmen
FlipCut Supertrees:更快地确定大型且准确的系统发育
- 批准号:
211926079 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Research Grants
Algorithms for the Analysis of Approximate Gene Cluster (3AGC)
近似基因簇分析算法 (3AGC)
- 批准号:
156864160 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Research Grants
Identifying the unknowns: towards structural elucidation of small molecules using mass spectrometry
识别未知数:利用质谱法阐明小分子的结构
- 批准号:
164582891 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Research Grants
Parameterized Algorithmics for Bioinformatics
生物信息学参数化算法
- 批准号:
162571619 - 财政年份:2009
- 资助金额:
-- - 项目类别:
Research Grants
Informatische Methoden für Massenspektrometrie in der Genomik
基因组学中质谱的信息方法
- 批准号:
5400926 - 财政年份:2003
- 资助金额:
-- - 项目类别:
Independent Junior Research Groups
Identifying the Unknowns: Fragmentation Trees and Molecular Fingerprints
识别未知物:碎片树和分子指纹
- 批准号:
324792648 - 财政年份:
- 资助金额:
-- - 项目类别:
Research Grants
相似海外基金
CAREER: A Universal Microsystem-based Vibration Energy Harvester
职业:基于通用微系统的振动能量收集器
- 批准号:
2237086 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Development of a energy harvester that obtains power from multiple power sources
开发从多个电源获取电力的能量采集器
- 批准号:
23K03953 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Charge-Free Electrostatic MEMS Vibration Energy Harvester for Sensor/LSI Integration
用于传感器/LSI 集成的无电荷静电 MEMS 振动能量收集器
- 批准号:
22H01929 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Wild Rice Harvester Technology Evaluation & Improvements
菰米收获机技术评价
- 批准号:
561849-2021 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Applied Research and Development Grants - Level 1
Membranous Energy Harvester with Tuning Capability for Flexible Electronics
具有柔性电子调节能力的膜能量收集器
- 批准号:
2106459 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Standard Grant
Gel energy harvester using ionic thermoelectric effect by inclusion-dissociation phenomena
通过包合解离现象利用离子热电效应的凝胶能量收集器
- 批准号:
21K18995 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
PFAS Harvester: A Technology for Destruction / Resource Recovery from PFAS
PFAS 收割机:一种用于 PFAS 销毁/资源回收的技术
- 批准号:
SR180200059 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Special Research Initiatives
Development of triboelectric based energy harvester for wearable applications
开发用于可穿戴应用的基于摩擦电的能量采集器
- 批准号:
20K20012 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Early-Career Scientists