Machine Learning Approaches to Predict Enzyme Function

预测酶功能的机器学习方法

基本信息

  • 批准号:
    BB/I00596X/1
  • 负责人:
  • 金额:
    $ 33.78万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2011
  • 资助国家:
    英国
  • 起止时间:
    2011 至 无数据
  • 项目状态:
    已结题

项目摘要

Proteins are amongst the most important of all molecules in biological systems. They are crucial to organisms which use them to carry out a huge variety of essential functions: catalysis, transport, storage, motor functions, signalling, chaperoning folding, regulation, molecular recognition, structural roles, and DNA Repair. As proteins are so ubiquitous in biology, understanding their properties is essential if we want to know about biological processes. This project is focused on one of the most significant of all protein functions: enzyme catalysis. Enzymes catalyse, or facilitate, the chemical reactions that occur in living organisms. Understanding how they work is both interesting in itself and useful in areas as diverse as drug design, diagnostics, biofuels, food science and laundry. This project is about the relationship between the structure of a protein and the enzyme function it carries out. We aim to predict the catalytic functionality from a knowledge of the protein structure. In order to achieve this, we will use machine learning methods, and in particular a technique called Random Forest. The forest consists of several hundred 'decision trees', each of which is basically a flow diagram. We will train them to learn patterns in the known properties of existing enzyme structures and the chemistry of the steps comprising the reactions they catalyse. However, the way in which we will generate the trees involves computer-simulated dice-rolling. This will ensure that they are all different, though based on the same underlying information. The decision trees then each make a prediction of the unknown possible catalytic functions. These predictions are treated as votes as to the function of the protein. This voting process produces a consensus of many decision trees and maximises the use of the information contained in the underlying data, generating results which are much more accurate than those of any one decision tree. The prediction of enzyme function is immensely important for a number of reasons. Firstly, being able to predict enzyme function more accurately will improve the functional annotation of genomes and reduce the current risk of misannotations being propagated through bioinformatics databases. Rapid developments in structural genomics, high throughput structure determination of diverse proteins from a wide variety of organisms, mean that many structures are available for enzymes whose functions are not yet known. Secondly, this project will allow us to recognise chemical similarities between evolutionarily unrelated enzymes that catalyse similar steps, though not necessarily similar overall reactions. Thirdly, this work will help us to understand the key determinants of the complex relationship between protein structure, function and evolution, particularly in terms of catalysis of reaction steps. Fourthly, the project will facilitate the design of new enzymes with either novel functions or carefully modified versions of existing functions. This project sits at an interface between disciplines, combining chemistry, biology and computer science. A wide range of skills and expertise is necessary to increase our understanding of catalysis, which has long been an important academic goal. Commercially, this work lays a foundation which is directly useful to the pharmaceutical and biotechnology industries, where enzymes are used both as diagnostics and therapeutics; the agrochemical industry, whose products often target enzymes; in the development of biofuels, which need robust enzymes to improve productivity and reduce costs; in laundry, where enzymes are already used in everyday products; and in the nutrition and food industries. In particular this project will aid in the design of new and repurposed enzymes.
Proteins are amongst the most important of all molecules in biological systems. They are crucial to organisms which use them to carry out a huge variety of essential functions: catalysis, transport, storage, motor functions, signalling, chaperoning folding, regulation, molecular recognition, structural roles, and DNA Repair. As proteins are so ubiquitous in biology, understanding their properties is essential if we want to know about biological processes. This project is focused on one of the most significant of all protein functions: enzyme catalysis. Enzymes catalyse, or facilitate, the chemical reactions that occur in living organisms. Understanding how they work is both interesting in itself and useful in areas as diverse as drug design, diagnostics, biofuels, food science and laundry. This project is about the relationship between the structure of a protein and the enzyme function it carries out. We aim to predict the catalytic functionality from a knowledge of the protein structure. In order to achieve this, we will use machine learning methods, and in particular a technique called Random Forest. The forest consists of several hundred 'decision trees', each of which is basically a flow diagram. We will train them to learn patterns in the known properties of existing enzyme structures and the chemistry of the steps comprising the reactions they catalyse. However, the way in which we will generate the trees involves computer-simulated dice-rolling. This will ensure that they are all different, though based on the same underlying information. The decision trees then each make a prediction of the unknown possible catalytic functions. These predictions are treated as votes as to the function of the protein. This voting process produces a consensus of many decision trees and maximises the use of the information contained in the underlying data, generating results which are much more accurate than those of any one decision tree. The prediction of enzyme function is immensely important for a number of reasons. Firstly, being able to predict enzyme function more accurately will improve the functional annotation of genomes and reduce the current risk of misannotations being propagated through bioinformatics databases. Rapid developments in structural genomics, high throughput structure determination of diverse proteins from a wide variety of organisms, mean that many structures are available for enzymes whose functions are not yet known. Secondly, this project will allow us to recognise chemical similarities between evolutionarily unrelated enzymes that catalyse similar steps, though not necessarily similar overall reactions. Thirdly, this work will help us to understand the key determinants of the complex relationship between protein structure, function and evolution, particularly in terms of catalysis of reaction steps. Fourthly, the project will facilitate the design of new enzymes with either novel functions or carefully modified versions of existing functions. This project sits at an interface between disciplines, combining chemistry, biology and computer science. A wide range of skills and expertise is necessary to increase our understanding of catalysis, which has long been an important academic goal. Commercially, this work lays a foundation which is directly useful to the pharmaceutical and biotechnology industries, where enzymes are used both as diagnostics and therapeutics; the agrochemical industry, whose products often target enzymes; in the development of biofuels, which need robust enzymes to improve productivity and reduce costs; in laundry, where enzymes are already used in everyday products; and in the nutrition and food industries. In particular this project will aid in the design of new and repurposed enzymes.

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Parzen Window method: In terms of two vectors and one matrix.
  • DOI:
    10.1016/j.patrec.2015.06.002
  • 发表时间:
    2015-10-01
  • 期刊:
  • 影响因子:
    5.1
  • 作者:
    Mussa HY;Mitchell JB;Afzal AM
  • 通讯作者:
    Afzal AM
From sequence to enzyme mechanism using multi-label machine learning.
  • DOI:
    10.1186/1471-2105-15-150
  • 发表时间:
    2014-05-19
  • 期刊:
  • 影响因子:
    3
  • 作者:
    De Ferrari L;Mitchell JB
  • 通讯作者:
    Mitchell JB
A note on utilising binary features as ligand descriptors.
  • DOI:
    10.1186/s13321-015-0105-3
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    8.6
  • 作者:
    Mussa HY;Mitchell JB;Glen RC
  • 通讯作者:
    Glen RC
Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces.
  • DOI:
    10.1186/s13104-015-1730-7
  • 发表时间:
    2015-12-03
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Mussa HY;De Ferrari L;Mitchell JB
  • 通讯作者:
    Mitchell JB
Machine learning methods in chemoinformatics.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

John Mitchell其他文献

The Origin, Nature, and Importance of Soil Organic Constituents having Base Exchange Properties 1
具有碱交换特性的土壤有机成分的起源、性质和重要性 1
  • DOI:
    10.2134/agronj1932.00021962002400040002x
  • 发表时间:
    1932
  • 期刊:
  • 影响因子:
    2.1
  • 作者:
    John Mitchell
  • 通讯作者:
    John Mitchell
Securing the Future of GenAI: Policy and Technology
确保 GenAI 的未来:政策和技术
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mihai Christodorescu;Google Ryan;Craven;S. Feizi;Neil Gong;Mia Hoffmann;Somesh Jha;Zhengyuan Jiang;Mehrdad Saberi Kamarposhti;John Mitchell;Jessica Newman;Emelia Probasco;Yanjun Qi;Khawaja Shams;Google Matthew;Turek
  • 通讯作者:
    Turek
The creativity quotient: An objective scoring of ideational fluency
创造力商数:思想流畅性的客观评分
  • DOI:
    10.1080/10400410409534552
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    A. Snyder;John Mitchell;T. Bossomaier;G. Pallier
  • 通讯作者:
    G. Pallier
Uncertainty in the IPCC's Third Assessment Report
IPCC第三次评估报告的不确定性
  • DOI:
    10.1126/science.1062823
  • 发表时间:
    2001
  • 期刊:
  • 影响因子:
    56.9
  • 作者:
    M. Allen;S. Raper;John Mitchell
  • 通讯作者:
    John Mitchell
Identification of organic compounds by microscopy and X-ray diffractometry
  • DOI:
    10.1007/bf01216628
  • 发表时间:
    1956-01-01
  • 期刊:
  • 影响因子:
    5.300
  • 作者:
    John Mitchell;Ada L. Ryland
  • 通讯作者:
    Ada L. Ryland

John Mitchell的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('John Mitchell', 18)}}的其他基金

AMPS: Mathematical Foundations of Market Operations with Renewable Bidders
AMPS:可再生能源投标人市场运作的数学基础
  • 批准号:
    2229335
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
AMPS: Rank Minimization Algorithms for Wide-Area Phasor Measurement Data Processing
AMPS:用于广域相量测量数据处理的秩最小化算法
  • 批准号:
    1736326
  • 财政年份:
    2017
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
SaTC-EDU: EAGER: Cybersecurity education for public policy
SaTC-EDU:EAGER:公共政策的网络安全教育
  • 批准号:
    1500089
  • 财政年份:
    2015
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
Collaborative Research: Binary Constrained Convex Quadratic Programs with Complementarity Constraints and Extensions
协作研究:具有互补约束和扩展的二元约束凸二次规划
  • 批准号:
    1334327
  • 财政年份:
    2013
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
Random Forest Prediction of Protein-Ligand Binding Affinities
蛋白质-配体结合亲和力的随机森林预测
  • 批准号:
    BB/G000247/1
  • 财政年份:
    2009
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Research Grant
Machine Learning Methods for Predicting Phospholipidosis
预测磷脂沉积症的机器学习方法
  • 批准号:
    EP/F049102/1
  • 财政年份:
    2008
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Research Grant
Collaborative Research: CT-M: Privacy, Compliance and Information Risk in Complex Organizational Processes
合作研究:CT-M:复杂组织流程中的隐私、合规性和信息风险
  • 批准号:
    0831199
  • 财政年份:
    2008
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Continuing Grant
Cutting Planes and Surfaces, and Conic Programming
切割平面和曲面以及圆锥规划
  • 批准号:
    0715446
  • 财政年份:
    2007
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
Collaborative research: High-Fidelity Methods for Security Protocols
合作研究:安全协议的高保真方法
  • 批准号:
    0430594
  • 财政年份:
    2004
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Continuing Grant
Polyhedral and Non-polyhedral Cutting Plane Methods: Theory, Algorithims and Applications
多面体和非多面体剖切面方法:理论、算法和应用
  • 批准号:
    0317323
  • 财政年份:
    2003
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Understanding structural evolution of galaxies with machine learning
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
  • 批准号:
    62003314
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
  • 批准号:
    61902016
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
  • 批准号:
    61806040
  • 批准年份:
    2018
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
  • 批准号:
    51769027
  • 批准年份:
    2017
  • 资助金额:
    38.0 万元
  • 项目类别:
    地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
  • 批准号:
    61573081
  • 批准年份:
    2015
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
  • 批准号:
    61572533
  • 批准年份:
    2015
  • 资助金额:
    66.0 万元
  • 项目类别:
    面上项目
E-Learning中学习者情感补偿方法的研究
  • 批准号:
    61402392
  • 批准年份:
    2014
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Automating data acquisition and data processing pipeline via artificial intelligence and machine learning approaches to allow at-home use of a novel breast cancer screening method employing bra-based elastography imaging.
通过人工智能和机器学习方法自动化数据采集和数据处理流程,以便在家使用基于胸罩的弹性成像成像的新型乳腺癌筛查方法。
  • 批准号:
    486956
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Operating Grants
Developing machine learning based approaches to weld residual stress problems
开发基于机器学习的方法来解决焊接残余应力问题
  • 批准号:
    2894296
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Studentship
Determining the ototoxic potential of COVID-19 therapeutics using machine learning and in vivo approaches
使用机器学习和体内方法确定 COVID-19 疗法的耳毒性潜力
  • 批准号:
    10732745
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
Developing novel machine learning approaches to studying cell development
开发新的机器学习方法来研究细胞发育
  • 批准号:
    2326879
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Continuing Grant
Research Initiation Award: Uncovering and Extracting Biological Information from Nanopore Long-read Sequencing Data with Machine Learning and Mathematical Approaches
研究启动奖:利用机器学习和数学方法从纳米孔长读长测序数据中发现和提取生物信息
  • 批准号:
    2300445
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Standard Grant
Improving aerosol and spray process computation fluid dynamics models with machine learning approaches
利用机器学习方法改进气溶胶和喷雾过程计算流体动力学模型
  • 批准号:
    2881557
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Studentship
CAREER: Combining Machine Learning and Physics-based Modeling Approaches for Accelerating Scientific Discovery
职业:结合机器学习和基于物理的建模方法来加速科学发现
  • 批准号:
    2239175
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Continuing Grant
Target identification from multiomics data using systems biology and machine learning approaches
使用系统生物学和机器学习方法从多组学数据中识别目标
  • 批准号:
    BB/Y512734/1
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Training Grant
Constructing a Digital Twin for a self-correcting Scanning Transmission Electron Microscope using Machine Learning Approaches
使用机器学习方法构建自校正扫描透射电子显微镜的数字孪生
  • 批准号:
    2889721
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Studentship
Cheminformatics and Machine Learning approaches for GPCR Computer Aided Drug Design
GPCR 计算机辅助药物设计的化学信息学和机器学习方法
  • 批准号:
    BB/X511778/1
  • 财政年份:
    2023
  • 资助金额:
    $ 33.78万
  • 项目类别:
    Training Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了