ECOD: Large scale classification of predicted and experimental protein structures

ECOD:预测和实验蛋白质结构的大规模分类

基本信息

项目摘要

Project Summary Classification of protein domains have historically served to contextualize the 3D structural data collectively generated by experimental structure determination methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. Our database, Evolutionary Classification of protein Domains (ECOD), has served the biological community for seven years cataloguing evolutionary relationships between domains from experimental structures. The recent advent of high-accuracy structure prediction methods, such as AlphaFold (AF) and RoseTTAFold (RF), and the consequent release of 1 million predicted structures in AlphaFold Database (AFDB) heralds a paradigm shift in structural biology and domain classification. The rate of structure deposition is expected to jump between a hundred to a thousand- fold. We propose to take advantage of this revolution and transform ECOD into a comprehensive classification of the entire protein university using sequence, structure, and functional evidence. By simultaneously classifying experimental and predicted structures of proteins from model organisms and human pathogens, our classification will help the scientific community to critically evaluate structure models and utilize the evolutionary information to discover and experimentally characterize protein function. Classifying AF models challenges the ECOD pipeline by a 50-fold increase in the workload and by the significant fraction of non-globular and low-quality regions in the models. Thus, our first Aim is to upgrade ECOD’s infrastructure and develop methods to identify single domains from AF models and to integrate sequence, structure, and functional site similarities into our automatic classification. Compared to the current ECOD workflow that relies on human experts for structure-and- function-based classification, these improvements will drastically decrease the need for manual curation and will allow us to achieve our second Aim, i.e., classifying domains of over 1 million released AF models into ECOD via a combination of computational pipelines and minimal manual efforts (0.25%  1% cases). Utilizing the deluge of AF models, the new automatic pipeline, and expertise of human curators, we expect both to significantly improve ECOD and to evaluate the quality of AF models by (1) covering all known protein families in Pfam, (2) confirming remote homology via evolutionary intermediates, (3) comparing evolutionarily related experimental and predicted structures, and (4) resolving errors and inconsistency through periodic quality checks. Finally, we will take the lead in making functional discoveries for biomedically important proteins classified by ECOD in our third Aim, studying virulence factors (VFs) in bacterial pathogens modelled by AFDB or studied by our experimental collaborators, the Orth lab. Fast evolving VFs were a challenge for structure prediction or functional inference by sequence. We will identify candidate VFs in two dozen bacterial pathogens, obtain their structure models, and infer their function using similarities to known proteins in structure and functional sites. Promising hypotheses will be tested experimentally in the Orth lab through biochemical and genetic assays.
项目概要 蛋白质结构域的分类历来用于将 3D 结构数据集中起来 实验结构测定方法,如X射线晶体学、核磁共振波谱、 和电子显微镜。我们的数据库,蛋白质结构域的进化分类(ECOD),已服务于生物 社区七年来从实验结构中对领域之间的进化关系进行了分类。这 最近出现的高精度结构预测方法,例如 AlphaFold (AF) 和 RoseTTAFold (RF),以及 随后在 AlphaFold 数据库 (AFDB) 中发布了 100 万个预测结构,预示着结构领域的范式转变 生物学和领域分类。结构沉积率预计将在一百到一千之间跳跃- 折叠。我们建议利用这场革命,将ECOD转变为一个综合分类 整个蛋白质大学使用序列、结构和功能证据。通过同时分类实验 并预测来自模式生物和人类病原体的蛋白质结构,我们的分类将有助于科学 社区批判性地评估结构模型并利用进化信息来发现和实验 表征蛋白质功能。 对 AF 模型进行分类对 ECOD 流程提出了挑战,工作量增加了 50 倍,并且大部分 模型中的非球状和低质量区域。因此,我们的首要目标是升级ECOD的基础设施并开发 从 AF 模型中识别单个域并整合序列、结构和功能位点相似性的方法 进入我们的自动分类。与当前依靠人类专家进行结构和设计的 ECOD 工作流程相比 基于功能的分类,这些改进将大大减少手动管理的需要,并使我们能够 实现我们的第二个目标,即通过组合将超过 100 万个已发布的 AF 模型的域分类为 ECOD 计算管道和最少的手动工作(0.25% - 1% 的情况)。利用大量 AF 模型,新 自动管道和人类策展人的专业知识,我们期望能够显着改善 ECOD 并评估 通过 (1) 涵盖 Pfam 中所有已知的蛋白质家族,(2) 通过进化确认远程同源性来确保 AF 模型的质量 中间体,(3)比较进化相关的实验结构和预测结构,以及(4)解决错误和 通过定期质量检查发现不一致的情况。最后,我们将率先进行功能发现 我们的第三个目标是研究细菌病原体中的毒力因子 (VF),通过 ECOD 分类的生物医学重要蛋白质 由 AFDB 建模或由我们的实验合作者 Orth 实验室研究。快速发展的 VF 对 通过序列进行结构预测或功能推理。我们将在两打细菌病原体中鉴定候选 VF, 获得它们的结构模型,并利用与已知蛋白质在结构和功能位点上的相似性来推断它们的功能。 有希望的假设将在奥尔斯实验室通过生化和遗传分析进行实验测试。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Richard Dustin Schaeffer其他文献

Richard Dustin Schaeffer的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

EXCESS: The role of excess topography and peak ground acceleration on earthquake-preconditioning of landslides
过量:过量地形和峰值地面加速度对滑坡地震预处理的作用
  • 批准号:
    NE/Y000080/1
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Research Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328975
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
SHINE: Origin and Evolution of Compressible Fluctuations in the Solar Wind and Their Role in Solar Wind Heating and Acceleration
SHINE:太阳风可压缩脉动的起源和演化及其在太阳风加热和加速中的作用
  • 批准号:
    2400967
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328973
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Market Entry Acceleration of the Murb Wind Turbine into Remote Telecoms Power
默布风力涡轮机加速进入远程电信电力市场
  • 批准号:
    10112700
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Collaborative R&D
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328972
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332916
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332917
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328974
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Study of the Particle Acceleration and Transport in PWN through X-ray Spectro-polarimetry and GeV Gamma-ray Observtions
通过 X 射线光谱偏振法和 GeV 伽马射线观测研究 PWN 中的粒子加速和输运
  • 批准号:
    23H01186
  • 财政年份:
    2023
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了