Machine learning approaches for faster discovery and adaptation of enzymes for difficult chemical reactions (MacBioSyn). Part I: providing solutions for regioselective oxygenations by 2OGD oxidases

机器学习方法可更快地发现和适应困难化学反应的酶 (MacBioSyn)。

基本信息

项目摘要

Biocatalytic synthesis of chemicals is considered a keystone for future green and sustainable chemistry. It is particularly highlighted in combination with digital transformation (Green Deal) by the European Commission. However, its power is far from being realized today in industry, mainly because of the limited activity or diversity of accessible enzymes. 2-oxoglutarate-dependent (2OGD) proteins are an under-researched family of enzymes which catalyze “tricky” oxidative reactions (e.g., oxyfunctionalization of non-activated carbons, demethylations), which are challenging or cannot be performed using traditional chemosynthesis. Thus, 2OGD proteins have the high potential to revolutionize the industry as a regio- and product-specific “alternative to chemistry”. Identifying new representatives of this large family having e.g. increased substrate scope can offer a new range of biocatalytic routes to e.g. natural products. However, a common challenge for enzyme development is the prediction of activity by exploring the enormous biodiversity through genome mining. Machine learning (ML) can capitalize on large and diverse enzyme datasets to predict function and activity, and explore the biodiversity to identify advanced biocatalysts. Additionally, ML methods comprise can optimize multiple protein traits simultaneously and to navigate sequence and chemical space efficiently.In the MacBioSyn project, we aim to develop (a general, high-throughput (HT)) ML-based framework (deep learning, active learning, reinforcement learning) that predicts the activity of enzymes and their substrate / reaction scope. We will implement a new in silico framework for the analysis of enzyme sequences/substrates pairs based on ML models trained on screening results by a synergistic approach, combining the interdisciplinary expertise of computational design / modeling (Davari) with HT enzyme characterization (Dippe/Wessjohann). To establish this platform, we will focus on 2OGD enzymes as proof-of-concept biocatalysts. The critical challenge that appears through machine learning projects is the dataset size available for training to establish consistent and reliable statistical modeling. Therefore, MacBioSyn aims at generating a large dataset by HT screening (> 1500 enzymes) of the superfamily’s biodiversity. Conversion of 30 substrates covering various structures will generate representative data to train our algorithms in an iterative process. In essence, our framework will provide a solution for activity / substrate scope prediction for biocatalyst discovery in general. The synergistic approach will provide methodologies that enable the power of ML methods to accelerate the discovery of improved enzymes, i. e. how biocatalytic reactions (here oxyfunctionalizations) are developed. The new fundamental design principles learned for 2OGD enzymes will broaden their applications in the biocatalytic production of valuable natural products and beyond.
生物催化合成化学品被认为是未来绿色和可持续化学的重点。欧盟委员会将其与数字化转型(绿色交易)相结合。然而,它的力量在今天的工业中还远远没有实现,主要是因为可获得的酶的活性或多样性有限。2-酮戊二酸依赖性(2 OGD)蛋白是一种研究不足的酶家族,其催化“棘手的”氧化反应(例如,非活性炭的氧官能化、去甲基化),这是具有挑战性的或不能使用传统化学合成进行。因此,2 OGD蛋白具有作为区域和产品特异性“化学替代品”彻底改变工业的高潜力。鉴定具有例如增加的底物范围的该大家族的新代表可以提供一系列新的生物催化途径,例如天然产物。然而,酶开发的一个共同挑战是通过基因组挖掘探索巨大的生物多样性来预测活性。机器学习(ML)可以利用大型和多样化的酶数据集来预测功能和活性,并探索生物多样性以识别先进的生物催化剂。此外,ML方法可以同时优化多个蛋白质性状,并有效地导航序列和化学空间。在MacBioSyn项目中,我们的目标是开发(通用,高通量(HT))基于ML的框架(深度学习,主动学习,强化学习),预测酶的活性及其底物/反应范围。我们将实施一个新的计算机模拟框架,用于分析基于ML模型的酶序列/底物对,该模型通过协同方法对筛选结果进行训练,将计算设计/建模(Davari)的跨学科专业知识与HT酶表征(Dippe/Wessjohann)相结合。为了建立这个平台,我们将专注于2 OGD酶作为概念验证生物催化剂。通过机器学习项目出现的关键挑战是可用于训练的数据集大小,以建立一致和可靠的统计建模。因此,MacBioSyn旨在通过HT筛选(> 1500种酶)超家族的生物多样性来生成大型数据集。覆盖各种结构的30个衬底的转换将生成代表性数据,以在迭代过程中训练我们的算法。从本质上讲,我们的框架将提供一个解决方案的活性/底物范围预测的生物催化剂的发现一般。协同方法将提供方法论,使ML方法的力量能够加速发现改进的酶,即。e.生物催化反应(这里是氧化功能化)是如何发展的。从2 OGD酶中学到的新的基本设计原理将拓宽它们在有价值的天然产物的生物催化生产中的应用。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dr. Mehdi Davari Dolatabadi, Ph.D.其他文献

Dr. Mehdi Davari Dolatabadi, Ph.D.的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dr. Mehdi Davari Dolatabadi, Ph.D.', 18)}}的其他基金

Understanding the sequence-structure-function relationship of the large arylsulfate sulfotransferase (ASST) enzyme family for engineering novel sulfation biocatalysts
了解大型芳基硫酸酯磺基转移酶 (ASST) 酶家族的序列-结构-功能关系,用于工程新型硫酸化生物催化剂
  • 批准号:
    505682627
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Research Grants

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Understanding structural evolution of galaxies with machine learning
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
  • 批准号:
    62003314
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
  • 批准号:
    61902016
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
儿童音乐能力发展对语言与社会认知能力及脑发育的影响
  • 批准号:
    31971003
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
  • 批准号:
    61806040
  • 批准年份:
    2018
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
  • 批准号:
    51769027
  • 批准年份:
    2017
  • 资助金额:
    38.0 万元
  • 项目类别:
    地区科学基金项目
多场景网络学习中基于行为-情感-主题联合建模的学习者兴趣挖掘关键技术研究
  • 批准号:
    61702207
  • 批准年份:
    2017
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
  • 批准号:
    61672236
  • 批准年份:
    2016
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目

相似海外基金

Automating data acquisition and data processing pipeline via artificial intelligence and machine learning approaches to allow at-home use of a novel breast cancer screening method employing bra-based elastography imaging.
通过人工智能和机器学习方法自动化数据采集和数据处理流程,以便在家使用基于胸罩的弹性成像成像的新型乳腺癌筛查方法。
  • 批准号:
    486956
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Operating Grants
Developing machine learning based approaches to weld residual stress problems
开发基于机器学习的方法来解决焊接残余应力问题
  • 批准号:
    2894296
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Studentship
Determining the ototoxic potential of COVID-19 therapeutics using machine learning and in vivo approaches
使用机器学习和体内方法确定 COVID-19 疗法的耳毒性潜力
  • 批准号:
    10732745
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Research Initiation Award: Uncovering and Extracting Biological Information from Nanopore Long-read Sequencing Data with Machine Learning and Mathematical Approaches
研究启动奖:利用机器学习和数学方法从纳米孔长读长测序数据中发现和提取生物信息
  • 批准号:
    2300445
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
CAREER: Combining Machine Learning and Physics-based Modeling Approaches for Accelerating Scientific Discovery
职业:结合机器学习和基于物理的建模方法来加速科学发现
  • 批准号:
    2239175
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Developing novel machine learning approaches to studying cell development
开发新的机器学习方法来研究细胞发育
  • 批准号:
    2326879
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Improving aerosol and spray process computation fluid dynamics models with machine learning approaches
利用机器学习方法改进气溶胶和喷雾过程计算流体动力学模型
  • 批准号:
    2881557
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Studentship
Target identification from multiomics data using systems biology and machine learning approaches
使用系统生物学和机器学习方法从多组学数据中识别目标
  • 批准号:
    BB/Y512734/1
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Training Grant
Cheminformatics and Machine Learning approaches for GPCR Computer Aided Drug Design
GPCR 计算机辅助药物设计的化学信息学和机器学习方法
  • 批准号:
    BB/X511778/1
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Training Grant
Constructing a Digital Twin for a self-correcting Scanning Transmission Electron Microscope using Machine Learning Approaches
使用机器学习方法构建自校正扫描透射电子显微镜的数字孪生
  • 批准号:
    2889721
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Studentship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了