Development Of Theoretical Methods For Studying Biological Macromolecules

生物大分子研究理论方法的发展

基本信息

项目摘要

pKa prediction by machine learning Machine learning techniques are developing rapidly in recent years and have been applied to numerous scientific fields. Previously, we presented four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. Error analyses were performed and showed that coupled ionizable residues are the most difficult ones in pKa prediction. To curate a more decent pKa database, we collected detailed information for the PDB structures (e.g., co-factors, whether it is membrane protein or not, etc.), which could be new features for our next version of pKa predictor. In addition, the calculated pKa's based on continuum electrostatic, will be added to the model. This feature will assure the inclusion of long-range electrostatics and the hydrophobic effect. Additional experimental pKa data were found in literatures and added to the new database. We also examined experimental conditions for both PDB structure measurements and pKa measurements to check if the structure-pKa pairs are compatible. Conceptual DFT for studying catalytic reactions in metalloproteins Metalloenzymes play a crucial role in maintaining human health by participating in essential biochemical processes that are vital for various physiological functions. These specialized enzymes require metal ions as cofactors to catalyze specific reactions with remarkable efficiency and specificity. Understanding the electronic structure of the active site helps reveal reaction mechanisms. In this project we use Conceptual DFT (CDFT) to study the electronic structure of metallic active sites in Photosystem II. CDFT uses quantum mechanical calculations for understanding and predicting the electronic structure and the inter-molecular and intra-molecular reactivity of molecules. It extends DFT by introducing the concept of quantum descriptors, which describe the tendency of a system to donate or accept electrons based on properties that include electronegativity, hardness, softness, the Fukui functions, and the dual descriptor. Furthermore, the local version of these descriptors is used to predict the reactive sites within the molecule. For example, the condensed Fukui functions and dual descriptors have been used to identify the atoms within the molecule that participate in nucleophilic/electrophilic attack. Our results show that metal cluster in Photosystem II divide the oxygen ligands into nucleophilic and electrophilic ligands to reduce the barrier of the O=O bond formation. Machine Learning models for predicting oxidation states of metals in proteins Imaging metalloenzymes with X-ray causes radiation damage, which changes the physical/chemical properties of the metal active site. However, the catalytic reactions in metalloproteins are usually coupled to the metal oxidation states. Thus, identifying the correct oxidation states of the metals is essential for understanding the reaction mechanisms. We built a huge data set of small coordination compounds collected from Cambridge Crystallographic Database (CCDC) for different metals (Fe, Mn, Cu, Co) at different oxidation states and used it to build a machine learning models (Decision Tree and Neural Networks classifiers) to predict the oxidation states of the metals in proteins structures imaged with X-ray and XFEL crystallography. Equivariant graph neural based electrostatic embedding in QM/MM simulations We present a novel methodology for efficiently and accurately integrating classical force fields with Quantum Mechanical (QM) theory in additive Quantum Mechanics/Molecular Mechanics (QM/MM) simulations. We design sparse E(3) equivariant neural networks, incorporating a sparse connection scheme that emulates the one-electron integrals in the ab-initio QM and MM regions of the calculations. Specifically, point charges from the MM engage in messaging solely with the QM region, eliminating direct interactions among themselves. To validate the effectiveness of our methodology, we conduct ablation studies and compare it to prominent alternatives, including High-Dimensional Neural Networks (HDNNs) and graph convolutional networks. Our findings demonstrate that our approach not only exhibits superior data efficiency but also outperforms HDNNs and graph convolutional networks by an order of magnitude in terms of computational efficiency. These advancements hold significant promise for diverse applications in areas such as materials science, chemistry, and drug discovery. bEDS for evaluation of free energies Free energies of binding and solvation are key quantities in modern drug-design, and being able to accurately and rapidly evaluate them remains an ongoing challenge. Taking inspiration from the Enveloping Distribution Sampling method (EDS), we developed the bridge-EDS (bEDS) method. EDS defines a smoothed potential energy function to overcome barriers on the potential energy surface. bridge-EDS applies this to all alchemical states typically used for solvation or ligand-binding processes. This samples a phase space with better overlap amongst alchemical states, freed from PES energy barriers, thus requiring shorter overall simulations. Thanks to our apoCHARMM code architecture and its multistate handling capability, energies of all different alchemical states are computed on the fly at no extra cost, and the post-processing effort becomes negligible, making bEDS a valuable asset for free energy prediction. Implementation of the Spherical Grids and Treecode Summation Algorithm Evaluating pair-wise electrostatic interactions is the most time consuming part when running molec- ular dynamics (MD) simulations, because of the slow decay of the Coulomb operator. The Spherical Grid and Treecode (SGT) summation algorithm speeds up the calculation by factoring the Coulomb operator into short and long ranged terms, where the short ranged term is handled by computing only the pair-wise interactions within a cutoff radius and the long ranged term is approximated using nu- merical cubature techniques. The SGT algorithms primary advantage over traditional methods is that it does not use the fast Fourier transform (FFT) which requires multiple all-to-all communications on multi-node systems. Avoiding the FFT makes the SGT algorithm well suited to be implemented for use on large CPU clusters and multi-GPU systems. In this work, we plan to develop a standalone library which implements the SGT algorithm. We have currently developed pilot code implementations and are in the process of developing high-performance versions for CPU and GPU based platforms. The CPU version is parallelized using a hybrid OpenMPI/OpenMP approach and the GPU version utilizes the CUDA API. We are currently in the process of optimizing the CPU and single GPU versions and plan to develop a multi-GPU version in the future. Other active projects: Rotatable Grids for Grid Inhomogeneous Solvation Theory Calculations Free energy calculation with Double Exponential Potential A spatial Self-Guided Molecular Simulation Method Free Energy Profile Decomposition Analysis Machine Learning Potentials (MLPs) for Reactive Systems Conceptual Density Functional Theory (DFT) Analysis for Enzyme Catalysis
通过机器学习预测 pKa 机器学习技术近年来发展迅速,并已应用于众多科学领域。之前,我们提出了四种用于蛋白质 pKa 预测的基于树的机器学习模型。随机森林、Extra Trees、eXtreme Gradient Boosting (XGBoost) 和 Light Gradient Boosting Machine (LightGBM) 这四个模型在三个实验 PDB 和 pKa 数据集上进行训练,其中两个数据集包含显着部分的内部残基。在最大数据集上训练的最佳模型比广泛使用的经验 pKa 预测工具 PROPKA 的性能好 37%。误差分析表明,耦合电离残基是 pKa 预测中最困难的残基。为了建立一个更合适的 pKa 数据库,我们收集了 PDB 结构的详细信息(例如,辅因子,是否是膜蛋白等),这可能是我们下一版本的 pKa 预测器的新功能。此外,基于连续静电计算的 pKa 将添加到模型中。该特征将确保包含远程静电和疏水效应。在文献中找到了其他实验 pKa 数据,并将其添加到新数据库中。我们还检查了 PDB 结构测量和 pKa 测量的实验条件,以检查结构-pKa 对是否兼容。 用于研究金属蛋白催化反应的概念 DFT 金属酶通过参与对各种生理功能至关重要的基本生化过程,在维持人类健康方面发挥着至关重要的作用。这些特殊的酶需要金属离子作为辅助因子,以显着的效率和特异性催化特定反应。了解活性位点的电子结构有助于揭示反应机制。在这个项目中,我们使用概念 DFT (CDFT) 来研究 Photosystem II 中金属活性位点的电子结构。 CDFT 使用量子力学计算来理解和预测电子结构以及分子的分子间和分子内反应性。它通过引入量子描述符的概念来扩展 DFT,量子描述符描述了系统根据电负性、硬度、柔软度、福井函数和对偶描述符等属性给予或接受电子的趋势。 此外,这些描述符的局部版本用于预测分子内的反应位点。例如,浓缩福井函数和双描述符已被用来识别分子内参与亲核/亲电攻击的原子。我们的结果表明,光系统 II 中的金属簇将氧配体分为亲核配体和亲电子配体,以减少 O=O 键形成的势垒。 用于预测蛋白质中金属氧化态的机器学习模型 用 X 射线对金属酶进行成像会导致辐射损伤,从而改变金属活性位点的物理/化学性质。然而,金属蛋白中的催化反应通常与金属氧化态耦合。因此,识别金属的正确氧化态对于理解反应机制至关重要。我们从剑桥晶体学数据库 (CCDC) 收集了不同氧化态的不同金属(Fe、Mn、Cu、Co)的小型配位化合物的巨大数据集,并用它来构建机器学习模型(决策树和神经网络分类器)来预测使用 X 射线和 XFEL 晶体学成像的蛋白质结构中金属的氧化态。 QM/MM 模拟中基于等变图神经的静电嵌入 我们提出了一种新颖的方法,可以在增性量子力学/分子力学 (QM/MM) 模拟中有效、准确地将经典力场与量子力学 (QM) 理论相结合。我们设计了稀疏 E(3) 等变神经网络,采用稀疏连接方案来模拟计算的从头算 QM 和 MM 区域中的单电子积分。 具体来说,MM 的点费用仅与 QM 区域进行消息传递,从而消除了它们之间的直接交互。为了验证我们方法的有效性,我们进行了消融研究,并将其与著名的替代方案进行比较,包括高维神经网络(HDNN)和图卷积网络。我们的研究结果表明,我们的方法不仅表现出卓越的数据效率,而且在计算效率方面也优于 HDNN 和图卷积网络一个数量级。这些进步为材料科学、化学和药物发现等领域的各种应用带来了巨大的希望。 用于评估自由能的 bEDS 结合和溶剂化的自由能是现代药物设计中的关键量,能够准确、快速地评估它们仍然是一个持续的挑战。受到包络分布采样方法 (EDS) 的启发,我们开发了桥式 EDS (bEDS) 方法。 EDS 定义了平滑势能函数来克服势能表面上的障碍。 Bridge-EDS 将此应用于通常用于溶剂化或配体结合过程的所有炼金状态。这对炼金状态之间具有更好重叠的相空间进行采样,不受 PES 能垒的影响,因此需要更短的整体模拟。得益于我们的 apoCHARMM 代码架构及其多状态处理能力,所有不同炼金状态的能量都可以即时计算,无需额外成本,并且后处理工作变得可以忽略不计,使 bEDS 成为自由能量预测的宝贵资产。 球形网格和Treecode求和算法的实现 由于库仑算子的缓慢衰减,在运行分子动力学(MD)模拟时,评估成对静电相互作用是最耗时的部分。球形网格和树码(SGT)求和算法通过将库仑算子分解为短范围和长范围项来加速计算,其中短范围项通过仅计算截止半径内的成对相互作用来处理,长范围项使用数值立方技术来近似。与传统方法相比,SGT 算法的主要优点是它不使用快速傅里叶变换 (FFT),而快速傅里叶变换需要在多节点系统上进行多次全对全通信。避免 FFT 使得 SGT 算法非常适合在大型 CPU 集群和多 GPU 系统上使用。在这项工作中,我们计划开发一个实现 SGT 算法的独立库。我们目前已经开发了试点代码实现,并且正在为基于 CPU 和 GPU 的平台开发高性能版本。 CPU 版本使用混合 OpenMPI/OpenMP 方法进行并行化,GPU 版本使用 CUDA API。我们目前正在优化CPU和单GPU版本,并计划未来开发多GPU版本。 其他活跃项目: 用于网格非均匀溶剂化理论计算的可旋转网格 双指数势的自由能计算 空间自引导分子模拟方法 自由能剖面分解分析 反应式系统的机器学习潜力 (MLP) 酶催化的概念密度泛函理论 (DFT) 分析

项目成果

期刊论文数量(46)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Global organization of a binding site network gives insight into evolution and structure-function relationships of proteins.
  • DOI:
    10.1038/s41598-017-10412-z
  • 发表时间:
    2017-09-14
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Lee J;Konc J;Janežič D;Brooks BR
  • 通讯作者:
    Brooks BR
The Extended Eighth-Shell method for periodic boundary conditions with rotational symmetry.
用于旋转对称周期性边界条件的扩展八壳法。
  • DOI:
    10.1002/jcc.26545
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Prasad,Samarjeet;Simmonett,AndrewC;Meana-Pañeda,Rubén;Brooks,BernardR
  • 通讯作者:
    Brooks,BernardR
Reformulation of the self-guided molecular simulation method.
  • DOI:
    10.1063/5.0019086
  • 发表时间:
    2020-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiongwu Wu;B. Brooks
  • 通讯作者:
    Xiongwu Wu;B. Brooks
Predicting hydration free energies with a hybrid QM/MM approach: an evaluation of implicit and explicit solvation models in SAMPL4.
使用混合 QM/MM 方法预测水合自由能:SAMPL4 中隐式和显式溶剂化模型的评估
  • DOI:
    10.1007/s10822-014-9708-4
  • 发表时间:
    2014-03
  • 期刊:
  • 影响因子:
    3.5
  • 作者:
    Koenig, Gerhard;Pickard, Frank C.;Mei, Ye;Brooks, Bernard R.
  • 通讯作者:
    Brooks, Bernard R.
A replica exchange umbrella sampling (REUS) approach to predict host-guest binding free energies in SAMPL8 challenge.
  • DOI:
    10.1007/s10822-021-00385-7
  • 发表时间:
    2021-05
  • 期刊:
  • 影响因子:
    3.5
  • 作者:
    Ghorbani M;Hudson PS;Jones MR;Aviat F;Meana-Pañeda R;Klauda JB;Brooks BR
  • 通讯作者:
    Brooks BR
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Bernard R Brooks其他文献

Bernard R Brooks的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Bernard R Brooks', 18)}}的其他基金

Development Of Theoretical Methods For Studying Biological Macromolecules
生物大分子研究理论方法的发展
  • 批准号:
    8557904
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Molecular Dynamics Simulations Of Biological Macromolecules
生物大分子的分子动力学模拟
  • 批准号:
    7968988
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Molecular Dynamics Simulations Of Biological Macromolecules
生物大分子的分子动力学模拟
  • 批准号:
    8939759
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Three-dimensional Structures Of Biological Macromolecules
生物大分子的三维结构
  • 批准号:
    7594372
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Molecular Dynamics Simulations Of Biological Macromolecules
生物大分子的分子动力学模拟
  • 批准号:
    10262664
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Development Of Advanced Computer Hardware And Software
先进计算机硬件和软件的开发
  • 批准号:
    10706226
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Development Of Theoretical Methods For Studying Biological Macromolecules
生物大分子研究理论方法的发展
  • 批准号:
    7734954
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Development Of Theoretical Methods For Studying Biological Macromolecules
生物大分子研究理论方法的发展
  • 批准号:
    8158018
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Molecular Dynamics Simulations of Biological Macromolecules
生物大分子的分子动力学模拟
  • 批准号:
    6109190
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:
Development of Advanced Computer Hardware and Software
先进计算机硬件和软件的开发
  • 批准号:
    6109192
  • 财政年份:
  • 资助金额:
    $ 111.78万
  • 项目类别:

相似海外基金

NSF-BSF: Towards a Molecular Understanding of Dynamic Active Sites in Advanced Alkaline Water Oxidation Catalysts
NSF-BSF:高级碱性水氧化催化剂动态活性位点的分子理解
  • 批准号:
    2400195
  • 财政年份:
    2024
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Standard Grant
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
  • 批准号:
    2334970
  • 财政年份:
    2024
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Standard Grant
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
  • 批准号:
    2334969
  • 财政年份:
    2024
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Standard Grant
Mechanochemical synthesis of nanocarbon and design of active sites for oxygen reducton/evolution reactions
纳米碳的机械化学合成和氧还原/演化反应活性位点的设计
  • 批准号:
    23K04919
  • 财政年份:
    2023
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Creation of porous inorganic frameworks with controlled structure of metal active sites by the building block method.
通过积木法创建具有金属活性位点受控结构的多孔无机框架。
  • 批准号:
    22KJ2957
  • 财政年份:
    2023
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Catalysis of Juxaposed Active Sites Created in Nanospaces and Their Applications
纳米空间中并置活性位点的催化及其应用
  • 批准号:
    23K04494
  • 财政年份:
    2023
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Generation of carbon active sites by modifying the oxygen containing functional groups and structures of carbons for utilizing to various catalytic reactions.
通过修饰碳的含氧官能团和结构来产生碳活性位点,用于各种催化反应。
  • 批准号:
    23K13831
  • 财政年份:
    2023
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
CAREER: CAS: Understanding the Chemistry of Palladium and Silyl Compounds to Design Catalyst Active Sites
职业:CAS:了解钯和甲硅烷基化合物的化学性质以设计催化剂活性位点
  • 批准号:
    2238379
  • 财政年份:
    2023
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Continuing Grant
CAS: Collaborative Research: Tailoring the Distribution of Transient vs. Dynamic Active Sites in Solid-Acid Catalysts and Their Impacts on Chemical Conversions
CAS:合作研究:定制固体酸催化剂中瞬时活性位点与动态活性位点的分布及其对化学转化的影响
  • 批准号:
    2154399
  • 财政年份:
    2022
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Standard Grant
Engineering of Active Sites in Heterogeneous Catalysts for Sustainable Chemical and Fuel Production.
用于可持续化学和燃料生产的多相催化剂活性位点工程。
  • 批准号:
    RGPIN-2019-06633
  • 财政年份:
    2022
  • 资助金额:
    $ 111.78万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了