权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Transparent Deep Learning for Directed Protein Evolution

用于定向蛋白质进化的透明深度学习

基本信息

批准号：
2745409
负责人：
金额：
--
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2745409
关键词：
Transparent Deep Learning Directed Protein

项目摘要

Protein engineering is a complex process, which requires finding an amino acid sequence associated with a desired function. As the design space grows exponentially as a function of the number of residues, de-novo design is currently an intractable problem. To overcome the curse of protein design complexity, scientists routinely rely on an iterative process consisting of random mutagenesis and selection of protein variants, called Directed Evolution (DE, 1); while this process led to remarkable results, it is extremely slow, low-throughput and expensive, as the probability of generating functional proteins at each step is low. Thus, for the last 30 years, scientists have developed biophysical models and optimisation methods to predict protein structure and function in-silico; however, these methods are usually not scalable to large proteins and are limited by the accuracy of the underlying biophysical models. Recently, Machine Learning (ML) and, in particular, Deep Learning (DL) have largely overcome these problems by learning functional relationships associated with protein folding and function directly from data [2]. However, it remains opaque and challenging to understand how a DL model makes structural and functional predictions [3], thus limiting their utility in understanding the biological design principles associated with functional proteins. AIMS AND OBJECTIVES: In collaboration with ZenithAI (OT/ZAI), we propose to design and build transparent and explainable deep learning models for protein design. The protein design space increases exponentially with the number of amino acid positions considered but functional proteins are extremely rare. Therefore, transparent models can provide a principled protein selection method, by only looking at important and uncertain amino acid positions, ultimately reducing the burden of experimental screening of protein variants. WORKPLAN. The project is structured in 3 work packages. - WP1 - The student will develop a deep learning framework for protein engineering, using state-of-the-art variational and adversarial models coupled with sequence-to-sequence models, which will be trained using curated protein sequence information stratified by species and function. - WP2 - The student will then develop probabilistic models to quantify uncertainty in designs by exploiting gradient and weights information learned by the model, ultimately to define a score to prioritise proteins for experimental testing. - WP3 - The student will use the model to design variants of the human S1PL enzyme, which will then be tested in the lab. S1PL is a central enzyme in the sphingolipid pathway, which is essential for proper cell functioning and it has a causal role in many diseases, including cancer and neurodegenerative disorders.TRAINING PROGRAM. The student will receive training in machine learning, statistical learning and deep learning, and will build a competitive profile in biological sequence modelling and design. The student will be also introduced to the emerging field of synthetic biology and will learn modern DNA cloning and assembly techniques and the use of protein expression systems at scale. We also put a strong emphasis on reproducible research; the student will receive training in advanced research software engineering and in reproducible workflows for data analyses.