Next-Generation Protein Engineering: Machine Learning for Enzyme Engineering
下一代蛋白质工程:酶工程的机器学习
基本信息
- 批准号:1937902
- 负责人:
- 金额:$ 78.75万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Enzymes are proteins that catalyze reactions. They are used in various applications such as detergents to clean stains, sensors to detect blood sugar levels, industrial processes such as production of high fructose corn syrup. Enzymes can be engineered to improve their performance using a method called directed evolution (DE). This is a process of repeated gene sequence mutation and enzyme screening to determine the change in performance caused by the mutated sequences. A side-product of DE experiments is an abundance of unused data. This project will incorporate machine learning (ML) into the DE workflow. This and other data will train the ML algorithm to recognize and ultimately predict protein structures that are useful in creating the enzyme activity desired. The objective is to be able to make novel and useful enzymes more rapidly and at lower cost.The diversity of life arises in great measure from the ability of proteins to evolve and adapt. The permutations of the 20 proteinogenic amino acids allow for supra-astronomical numbers of possible protein sequences, the vast majority of which do not fold or encode a useful function. We propose that machine learning (ML) models can use information gained from directed evolution (DE) experiments to improve the efficiency of searching this space for functional proteins. While techniques such as DE implicitly rely on an underlying structure of the functional landscape of protein sequence space, explicitly modeling this structure would allow for far more efficient search algorithms. We recently demonstrated a data-driven, ML approach to guiding DE experiments which accounts for the epistatic nature of protein mutations and enables multiple beneficial mutations to be incorporated in a single generation of mutation and screening. We aim to further develop this workflow to address multiple tasks simultaneously, specifically to predict enzyme activity across multiple substrates. To accomplish this, we propose to incorporate information about multiple substrates into ML-guided directed evolution and use validated encodings for proteins and substrates developed for separate predictive tasks. However, these encodings are not optimized to work together for ML. We therefore propose to develop new encodings that describe the components of an enzymatic system in a cohesive and synergistic manner. Finally, while directed evolution has successfully adapted enzymes for human applications, this process currently requires expert knowledge and intensive trial-and-error for each engineering task. The requirement for expert knowledge is clearest when approaching the formidable challenge of finding starting activity for DE. Using ML, we will build a model that can predict the non-natural carbene/nitrene transfer activities of P450s against target substrates, based on what is known of the natural substrate(s) of these enzymes.This award is cofounded by the Cellular and Biochemical Engineering Program in the Division of Chemical, Bioengineering, Environmental and Transport Systems and the Systems and Synthetic Biology Program in the Division of Molecular and Cellular Biosciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
酶是催化反应的蛋白质。它们用于各种应用,如清洁污渍的洗涤剂,检测血糖水平的传感器,工业过程,如生产高果糖玉米糖浆。酶可以通过一种称为定向进化(DE)的方法进行工程改造以提高其性能。这是一个重复的基因序列突变和酶筛选的过程,以确定突变序列引起的性能变化。DE实验的副产品是大量未使用的数据。该项目将把机器学习(ML)纳入DE工作流程。这些数据和其他数据将训练ML算法识别并最终预测有助于产生所需酶活性的蛋白质结构。生物多样性的目标是能够更快、更低成本地制造出新型和有用的酶。生命的多样性在很大程度上源于蛋白质的进化和适应能力。20个蛋白质氨基酸的排列允许超天文数字的可能蛋白质序列,其中绝大多数不折叠或编码有用的功能。 我们建议机器学习(ML)模型可以使用从定向进化(DE)实验中获得的信息来提高在该空间中搜索功能蛋白质的效率。虽然DE等技术隐含地依赖于蛋白质序列空间的功能景观的底层结构,但明确地建模这种结构将允许更有效的搜索算法。我们最近展示了一种数据驱动的ML方法来指导DE实验,该方法解释了蛋白质突变的上位性,并使多个有益突变能够被纳入单代突变和筛选中。我们的目标是进一步开发这个工作流程,以同时解决多个任务,特别是预测跨多种底物的酶活性。为了实现这一目标,我们建议将有关多个基板的信息纳入ML指导的定向进化,并使用验证编码的蛋白质和基板开发单独的预测任务。然而,这些编码并没有被优化以一起用于ML。因此,我们建议开发新的编码,描述的组成部分的酶系统的凝聚力和协同作用的方式。最后,虽然定向进化已经成功地使酶适应人类应用,但这一过程目前需要专家知识和密集的试错来完成每一项工程任务。当接近寻找DE起始活性的艰巨挑战时,对专家知识的要求是最清楚的。使用ML,我们将建立一个模型,可以预测P450对目标底物的非天然卡宾/氮烯转移活性,基于已知的这些酶的天然底物。该奖项由化学,生物工程,环境和运输系统以及分子和细胞生物科学部的系统和合成生物学计划。该奖项反映了NSF的法定使命,并被认为值得支持通过使用基金会的知识价值和更广泛的影响审查标准进行评估。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library
- DOI:10.1021/acssynbio.1c00592
- 发表时间:2022-03-18
- 期刊:
- 影响因子:4.7
- 作者:Wittmann, Bruce J.;Johnston, Kadina E.;Arnold, Frances H.
- 通讯作者:Arnold, Frances H.
Advances in machine learning for directed evolution
- DOI:10.1016/j.sbi.2021.01.008
- 发表时间:2021-08-01
- 期刊:
- 影响因子:6.8
- 作者:Wittmann, Bruce J.;Johnston, Kadina E.;Arnold, Frances H.
- 通讯作者:Arnold, Frances H.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Frances Arnold其他文献
MicroED structure of Aeropyrum pernix protoglobin
Aeropyrum pernix 原珠蛋白的 MicroED 结构
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
E. Danelius;T. Gonen;Frances Arnold;Nicholas K. Porter - 通讯作者:
Nicholas K. Porter
Frances Arnold的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Frances Arnold', 18)}}的其他基金
Evolving Hemoproteins for New-to-Nature Ring-Forming Reactions
进化血红素蛋白以实现新的自然成环反应
- 批准号:
2016137 - 财政年份:2020
- 资助金额:
$ 78.75万 - 项目类别:
Standard Grant
Expanding the Enzyme Repertoire by Evolution and Engineering
通过进化和工程扩展酶库
- 批准号:
1513007 - 财政年份:2015
- 资助金额:
$ 78.75万 - 项目类别:
Standard Grant
SusChEM: Engineering and Evolution of Cytochrome P450 Enzymes for Non-Natural Chemistry
SusChEM:非天然化学细胞色素 P450 酶的工程和进化
- 批准号:
1403077 - 财政年份:2014
- 资助金额:
$ 78.75万 - 项目类别:
Standard Grant
Collaborative Research: Metabolically Engineered Organisms for Conversion of Cellulose to Isobutanol
合作研究:将纤维素转化为异丁醇的代谢工程生物体
- 批准号:
0903817 - 财政年份:2009
- 资助金额:
$ 78.75万 - 项目类别:
Standard Grant
BIC: Collaborative Research: Evolutionary Optimization of Biological Circuits: Towards Cellular Programming
BIC:合作研究:生物回路的进化优化:迈向细胞编程
- 批准号:
0522831 - 财政年份:2005
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
Laboratory Evolution of Biocatalysts for Methane Hydroxylation and Alkene Epoxidation
甲烷羟基化和烯烃环氧化生物催化剂的实验室进展
- 批准号:
0313567 - 财政年份:2003
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
Qubic: Biological Information Technology Systems: Self-Perfecting Genetic Circuits
Qubic:生物信息技术系统:自我完善的遗传电路
- 批准号:
0130613 - 财政年份:2002
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
ME: Interagency Announcement of Opportunities in Metabolic Engineering: Laboratory Evolution of Carotenoid Biosynthetic Pathways
ME:代谢工程机会的机构间公告:类胡萝卜素生物合成途径的实验室进化
- 批准号:
0118565 - 财政年份:2001
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
Tools for Directed Evolution of Oxygenases: High Throughput Screening of Epoxidation and Hydroxylation Catalysts
加氧酶定向进化工具:环氧化和羟基化催化剂的高通量筛选
- 批准号:
9981770 - 财政年份:2000
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
A Microfabricated Cell Sorter for Molecular Evolution
用于分子进化的微加工细胞分选器
- 批准号:
9901495 - 财政年份:1999
- 资助金额:
$ 78.75万 - 项目类别:
Continuing Grant
相似国自然基金
Next Generation Majorana Nanowire Hybrids
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
相似海外基金
Unlocking the Next-Generation Protein Expression Systems for Animal-Free Cheese Production
解锁用于非动物奶酪生产的下一代蛋白质表达系统
- 批准号:
10074086 - 财政年份:2023
- 资助金额:
$ 78.75万 - 项目类别:
Collaborative R&D
Supercharged protein-surfactant bioconjugates for next-generation cell therapies
用于下一代细胞疗法的增压蛋白质-表面活性剂生物缀合物
- 批准号:
MR/X01116X/1 - 财政年份:2023
- 资助金额:
$ 78.75万 - 项目类别:
Fellowship
Alternative protein sources: growing the next generation computational modelling framework
替代蛋白质来源:发展下一代计算模型框架
- 批准号:
2886049 - 财政年份:2023
- 资助金额:
$ 78.75万 - 项目类别:
Studentship
ProNaGen: Engineering of Recombinant Protein Nanosheet-Based Bioemulsions for Next Generation Bioprocessing and Biomanufacturing
ProNaGen:用于下一代生物加工和生物制造的基于重组蛋白纳米片的生物乳液的工程
- 批准号:
EP/Y032667/1 - 财政年份:2023
- 资助金额:
$ 78.75万 - 项目类别:
Research Grant
Next-Generation Adaptive Evolution Toolkit to Increase Protein Production in Precision Fermentation
下一代自适应进化工具包可提高精密发酵中的蛋白质产量
- 批准号:
10068375 - 财政年份:2023
- 资助金额:
$ 78.75万 - 项目类别:
Collaborative R&D
Next generation PHOTACS: A light switchable system for enhanced spatial and temporal precision in targeted protein degradation
下一代 PHOTACS:光可切换系统,可增强目标蛋白质降解的空间和时间精度
- 批准号:
2753267 - 财政年份:2022
- 资助金额:
$ 78.75万 - 项目类别:
Studentship
Exploiting protein stability changes to identify next generation chemical probes and target engagement technologies
利用蛋白质稳定性变化来识别下一代化学探针和目标接合技术
- 批准号:
RGPIN-2022-03107 - 财政年份:2022
- 资助金额:
$ 78.75万 - 项目类别:
Discovery Grants Program - Individual
Targeting methyl-CpG-binding domain protein 2 (Mbd2) towards the development of next generation precision Epi-therapies against metastatic breast cancer
靶向甲基 CpG 结合域蛋白 2 (Mbd2),开发针对转移性乳腺癌的下一代精准外延疗法
- 批准号:
464539 - 财政年份:2022
- 资助金额:
$ 78.75万 - 项目类别:
Operating Grants
Targeting methyl-CpG-binding domain protein 2 (Mbd2) towards the development of next generation precision Epi-therapies against metastatic breast cancer
靶向甲基 CpG 结合域蛋白 2 (Mbd2),开发针对转移性乳腺癌的下一代精准外延疗法
- 批准号:
472577 - 财政年份:2022
- 资助金额:
$ 78.75万 - 项目类别:
Operating Grants
Accelerating the discovery of next-generation cancer therapeutics through exploring protein fitness landscapes using a machine learning-driven evolution engine
使用机器学习驱动的进化引擎探索蛋白质适应性景观,加速下一代癌症疗法的发现
- 批准号:
10032925 - 财政年份:2022
- 资助金额:
$ 78.75万 - 项目类别:
Collaborative R&D