Development of data-driven sampling and its application to protein design and variant prediction
数据驱动采样的开发及其在蛋白质设计和变异预测中的应用
基本信息
- 批准号:RGPIN-2017-06421
- 负责人:
- 金额:$ 2.48万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In previous work, we have successfully developed machine-learning based methods to solve various prediction problems in structural biology, including predicting changes to stability and binding affinity upon mutations. We have also developed a new protein engineering platform tightly coupling computational design with high-throughput screening. However, in these approaches, we still made use of conventional, physics inspired modeling methods. Here, we plan to develop more data-driven methods for protein modeling.
One fundamental issue in computational biochemistry is conformational sampling. First, a protein in nature is a dynamic entity and will adopt many different conformations even in just its native state, not to mention once perturbations are introduced. Exhaustive exploration of different possible conformations using conventional methods is either computationally highly expensive or lacking in accuracy. In particular protein backbones present a challenge, while side-chains can generally be modeled well using so-called rotamers. We here propose to develop modern methods based on dimensionality reduction methods to make sampling much more efficient and accurate. The rationale is that a protein's atoms are strongly constraint in their movement and much of these natural constraints can be learned from existing structures of different conformations; by now most important proteins have available structures in multiple conformations. We plan to develop a method based on Gaussian Process Latent Variable Models. Using this method, efficient subspaces can be learned on existing protein structures, and the dimensionality can potentially be reduced by orders of magnitude, speeding up sampling enormously while still generating accurate conformations. We will first establish this as a method to sample protein backbone conformations in a number of established protein scaffolds. Then, we will apply it to the problem of protein design and incorporate it into our previously developed protein engineering framework, where it will lead to improved design accuracy. Finally, we will implement it as a backbone relaxation method in our predictor of mutation effects on protein stability and binding affinities. Having the improved backbone sampling will also enable us to implement prediction of effects of indels and post-translational modification in this framework. Our novel sampling methodology will have a strong impact to many other problems in the field as well.
在以前的工作中,我们已经成功地开发了基于机器学习的方法来解决结构生物学中的各种预测问题,包括预测突变后稳定性和结合亲和力的变化。我们还开发了一个新的蛋白质工程平台,将计算设计与高通量筛选紧密耦合。然而,在这些方法中,我们仍然使用传统的物理建模方法。在这里,我们计划开发更多的数据驱动的蛋白质建模方法。
计算生物化学中的一个基本问题是构象采样。首先,自然界中的蛋白质是一个动态实体,即使在其天然状态下也会采用许多不同的构象,更不用说一旦引入扰动。使用常规方法对不同的可能构象进行详尽的探索在计算上是非常昂贵的或缺乏准确性。特别是蛋白质骨架提出了挑战,而侧链通常可以使用所谓的旋转异构体很好地建模。在这里,我们建议开发基于降维方法的现代方法,使采样更加有效和准确。基本原理是蛋白质的原子在其运动中受到强烈的约束,并且这些自然约束中的大部分可以从不同构象的现有结构中学习;到目前为止,大多数重要的蛋白质具有多种构象的可用结构。我们计划开发一种基于高斯过程潜变量模型的方法。使用这种方法,可以在现有的蛋白质结构上学习有效的子空间,并且可以将维度降低几个数量级,大大加快采样速度,同时仍然生成准确的构象。我们将首先建立这作为一种方法,在一些已建立的蛋白质支架中对蛋白质骨架构象进行采样。然后,我们将把它应用于蛋白质设计的问题,并将其纳入我们以前开发的蛋白质工程框架,在那里它将导致提高设计精度。最后,我们将实现它作为一个骨干松弛方法在我们的预测突变对蛋白质的稳定性和结合亲和力的影响。改进的主干采样还将使我们能够在该框架中预测插入/缺失和翻译后修饰的影响。我们新颖的采样方法也将对该领域的许多其他问题产生强烈的影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kim, Philip其他文献
Atomically thin p-n junctions with van der Waals heterointerfaces
- DOI:
10.1038/nnano.2014.150 - 发表时间:
2014-09-01 - 期刊:
- 影响因子:38.3
- 作者:
Lee, Chul-Ho;Lee, Gwan-Hyoung;Kim, Philip - 通讯作者:
Kim, Philip
Heterostructures based on inorganic and organic van der Waals systems
- DOI:
10.1063/1.4894435 - 发表时间:
2014-09-01 - 期刊:
- 影响因子:6.1
- 作者:
Lee, Gwan-Hyoung;Lee, Chul-Ho;Kim, Philip - 通讯作者:
Kim, Philip
Electrical control of interlayer exciton dynamics in atomically thin heterostructures
- DOI:
10.1126/science.aaw4194 - 发表时间:
2019-11-15 - 期刊:
- 影响因子:56.9
- 作者:
Jauregui, Luis A.;Joe, Andrew Y.;Kim, Philip - 通讯作者:
Kim, Philip
Carbon wonderland
- DOI:
10.1038/scientificamerican0408-90 - 发表时间:
2008-04-01 - 期刊:
- 影响因子:3
- 作者:
Geim, Andre K.;Kim, Philip - 通讯作者:
Kim, Philip
Burden of Severe Illness Associated With Laboratory-Confirmed Influenza in Adults Aged 50-64 Years, 2010-2011 to 2016-2017.
- DOI:
10.1093/ofid/ofac664 - 发表时间:
2023-01 - 期刊:
- 影响因子:4.2
- 作者:
Kim, Philip;Coleman, Brenda;Kwong, Jeffrey C.;Plevneshi, Agron;Hassan, Kazi;Green, Karen;McNeil, Shelly A.;Armstrong, Irene;Gold, Wayne L.;Gubbay, Jonathan;Katz, Kevin;Kuster, Stefan P.;Lovinsky, Reena;Matukas, Larissa;Ostrowska, Krystyna;Richardson, David;McGeer, Allison - 通讯作者:
McGeer, Allison
Kim, Philip的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kim, Philip', 18)}}的其他基金
Development of data-driven sampling and its application to protein design and variant prediction
数据驱动采样的开发及其在蛋白质设计和变异预测中的应用
- 批准号:
RGPIN-2017-06421 - 财政年份:2021
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Development of data-driven sampling and its application to protein design and variant prediction
数据驱动采样的开发及其在蛋白质设计和变异预测中的应用
- 批准号:
RGPIN-2017-06421 - 财政年份:2019
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Development of data-driven sampling and its application to protein design and variant prediction
数据驱动采样的开发及其在蛋白质设计和变异预测中的应用
- 批准号:
RGPIN-2017-06421 - 财政年份:2018
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Development of data-driven sampling and its application to protein design and variant prediction
数据驱动采样的开发及其在蛋白质设计和变异预测中的应用
- 批准号:
RGPIN-2017-06421 - 财政年份:2017
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Analysis of evolutionary mechanisms in signaling networks
信号网络进化机制分析
- 批准号:
386671-2010 - 财政年份:2014
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Analysis of evolutionary mechanisms in signaling networks
信号网络进化机制分析
- 批准号:
386671-2010 - 财政年份:2013
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Analysis of evolutionary mechanisms in signaling networks
信号网络进化机制分析
- 批准号:
386671-2010 - 财政年份:2012
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Analysis of evolutionary mechanisms in signaling networks
信号网络进化机制分析
- 批准号:
386671-2010 - 财政年份:2011
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Analysis of evolutionary mechanisms in signaling networks
信号网络进化机制分析
- 批准号:
386671-2010 - 财政年份:2010
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于高频信息下高维波动率矩阵估计及应用
- 批准号:71901118
- 批准年份:2019
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
半参数空间自回归面板模型的有效估计与应用研究
- 批准号:71961011
- 批准年份:2019
- 资助金额:16.0 万元
- 项目类别:地区科学基金项目
高频数据波动率统计推断、预测与应用
- 批准号:71971118
- 批准年份:2019
- 资助金额:50.0 万元
- 项目类别:面上项目
基于个体分析的投影式非线性非负张量分解在高维非结构化数据模式分析中的研究
- 批准号:61502059
- 批准年份:2015
- 资助金额:19.0 万元
- 项目类别:青年科学基金项目
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
体数据表达与绘制的新方法研究
- 批准号:61170206
- 批准年份:2011
- 资助金额:55.0 万元
- 项目类别:面上项目
一类新Regime-Switching模型及其在金融建模中的应用研究
- 批准号:11061041
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:地区科学基金项目
相似海外基金
Development of a Physics-Data Driven Surface Flux Parameterization for Flow in Complex Terrain
开发物理数据驱动的复杂地形流动表面通量参数化
- 批准号:
2336002 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
Development of Informatics Materials with an Awareness of the High School-University connection and a Learning Support Environment for Data-Driven Instruction
开发具有高中与大学联系意识的信息学材料和数据驱动教学的学习支持环境
- 批准号:
23H01019 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Development of data-driven multiple sound spot synthesis technology based on deep generative neural network models
基于深度生成神经网络模型的数据驱动多声点合成技术开发
- 批准号:
23K11177 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
EAGER: Development of a Hybrid Knowledge- and Data-Driven Approach to Guide the Design of Immunotherapeutic Cells
EAGER:开发混合知识和数据驱动的方法来指导免疫治疗细胞的设计
- 批准号:
2324742 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
Collaborative Research: Advancing the Science of STEM Interest Development through Educational Gameplay with Machine Learning and Data-driven Interviews
合作研究:通过机器学习和数据驱动访谈的教育游戏推进 STEM 兴趣发展科学
- 批准号:
2301173 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
Development of edible sorbent therapies to mitigate dietary exposures to per- and polyfluoroalkyl substances (PFAS)
开发可食用吸附剂疗法以减少膳食中全氟烷基物质和多氟烷基物质 (PFAS) 的暴露
- 批准号:
10590799 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Collaborative Research: Advancing the Science of STEM Interest Development through Educational Gameplay with Machine Learning and Data-driven Interviews
合作研究:通过机器学习和数据驱动访谈的教育游戏推进 STEM 兴趣发展科学
- 批准号:
2301172 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
Development of Data-Collection Algorithms and Data-Driven Control Methods for Guaranteed Stabilization of Nonlinear Systems with Uncertain Equilibria and Orbits
开发数据收集算法和数据驱动控制方法,以保证具有不确定平衡和轨道的非线性系统的稳定性
- 批准号:
23K03913 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Research and development of vector image retrieval by cooperating knowledge-driven and data-driven models
知识驱动与数据驱动模型协同的矢量图像检索研究与开发
- 批准号:
23K11121 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of bioinformatic pipelines for multi omics data to model the effect of small molecule driven lipid metabolism on autophagy regulation
开发多组学数据的生物信息学管道,以模拟小分子驱动的脂质代谢对自噬调节的影响
- 批准号:
BB/Y512540/1 - 财政年份:2023
- 资助金额:
$ 2.48万 - 项目类别:
Training Grant