ECOD: Large scale classification of predicted and experimental protein structures

ECOD:预测和实验蛋白质结构的大规模分类

基本信息

项目摘要

Project Summary Classification of protein domains have historically served to contextualize the 3D structural data collectively generated by experimental structure determination methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. Our database, Evolutionary Classification of protein Domains (ECOD), has served the biological community for seven years cataloguing evolutionary relationships between domains from experimental structures. The recent advent of high-accuracy structure prediction methods, such as AlphaFold (AF) and RoseTTAFold (RF), and the consequent release of 1 million predicted structures in AlphaFold Database (AFDB) heralds a paradigm shift in structural biology and domain classification. The rate of structure deposition is expected to jump between a hundred to a thousand- fold. We propose to take advantage of this revolution and transform ECOD into a comprehensive classification of the entire protein university using sequence, structure, and functional evidence. By simultaneously classifying experimental and predicted structures of proteins from model organisms and human pathogens, our classification will help the scientific community to critically evaluate structure models and utilize the evolutionary information to discover and experimentally characterize protein function. Classifying AF models challenges the ECOD pipeline by a 50-fold increase in the workload and by the significant fraction of non-globular and low-quality regions in the models. Thus, our first Aim is to upgrade ECOD’s infrastructure and develop methods to identify single domains from AF models and to integrate sequence, structure, and functional site similarities into our automatic classification. Compared to the current ECOD workflow that relies on human experts for structure-and- function-based classification, these improvements will drastically decrease the need for manual curation and will allow us to achieve our second Aim, i.e., classifying domains of over 1 million released AF models into ECOD via a combination of computational pipelines and minimal manual efforts (0.25%  1% cases). Utilizing the deluge of AF models, the new automatic pipeline, and expertise of human curators, we expect both to significantly improve ECOD and to evaluate the quality of AF models by (1) covering all known protein families in Pfam, (2) confirming remote homology via evolutionary intermediates, (3) comparing evolutionarily related experimental and predicted structures, and (4) resolving errors and inconsistency through periodic quality checks. Finally, we will take the lead in making functional discoveries for biomedically important proteins classified by ECOD in our third Aim, studying virulence factors (VFs) in bacterial pathogens modelled by AFDB or studied by our experimental collaborators, the Orth lab. Fast evolving VFs were a challenge for structure prediction or functional inference by sequence. We will identify candidate VFs in two dozen bacterial pathogens, obtain their structure models, and infer their function using similarities to known proteins in structure and functional sites. Promising hypotheses will be tested experimentally in the Orth lab through biochemical and genetic assays.
项目摘要 蛋白质结构域的分类历来用于将由以下各项共同生成的3D结构数据设置为上下文 X射线结晶学、核磁共振波谱等实验结构测定方法, 和电子显微镜。我们的数据库,蛋白质结构域的进化分类(ECOD),已经服务于生物学 七年的社区,从实验结构中编目领域之间的进化关系。这个 最近出现的高精度结构预测方法,如AlphaFold(AF)和RoseTTAFold(RF),以及 AlphaFold数据库中随后发布的100万个预测结构预示着结构领域的范式转变 生物学和领域分类。结构沉积的速度预计会在100到1000之间- 收牌。我们建议利用这场革命,将《经济、社会和文化权利公约》转变为对 整个蛋白质大学使用序列、结构和功能证据。通过同时将实验数据分类 以及从模式生物和人类病原体预测的蛋白质结构,我们的分类将有助于科学 社区对结构模型进行批判性评估,并利用进化信息进行发现和实验 描述蛋白质的功能。 房颤模型的分类向ECOD管道提出了挑战,工作量增加了50倍, 模型中的非球形和低质量区域。因此,我们的首要目标是升级经社理事会的基础设施并发展 从房颤模型中识别单个结构域并整合序列、结构和功能部位相似性的方法 进入我们的自动分类系统。与目前的ECOD工作流程相比,该工作流程依赖人类专家进行结构和- 基于功能的分类,这些改进将大大减少手动管理的需要,并将使我们能够 为了实现我们的第二个目标,即通过以下组合将100多万个已发布的AF模型的域分类到ECOD中 计算管道和最少的人工工作(0.25%1%的案例)。利用泛滥的自动对焦模型,新的 自动化管道和人类策展人的专业知识,我们预计这两者都将显著改善经社理事会,并评估 房颤模型的质量通过(1)覆盖Pfam中所有已知的蛋白质家族,(2)通过进化确认远程同源性 中间体,(3)比较进化相关的实验和预测结构,以及(4)分解误差和 通过定期质量检查不一致。最后,我们将率先对以下内容进行功能发现 在我们的第三个目标中,研究细菌病原体中的毒力因子(VFS) 由AFDB建模或由我们的实验合作者Orth实验室研究。快速发展的VFS是对 按顺序进行结构预测或功能推理。我们将在24种细菌病原体中鉴定候选VFS, 获得它们的结构模型,并利用它们在结构和功能位点上与已知蛋白质的相似性来推断它们的功能。 有希望的假说将在Orth实验室通过生化和遗传测试进行实验验证。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Richard Dustin Schaeffer其他文献

Richard Dustin Schaeffer的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

EXCESS: The role of excess topography and peak ground acceleration on earthquake-preconditioning of landslides
过量:过量地形和峰值地面加速度对滑坡地震预处理的作用
  • 批准号:
    NE/Y000080/1
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Research Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328975
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
SHINE: Origin and Evolution of Compressible Fluctuations in the Solar Wind and Their Role in Solar Wind Heating and Acceleration
SHINE:太阳风可压缩脉动的起源和演化及其在太阳风加热和加速中的作用
  • 批准号:
    2400967
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Market Entry Acceleration of the Murb Wind Turbine into Remote Telecoms Power
默布风力涡轮机加速进入远程电信电力市场
  • 批准号:
    10112700
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Collaborative R&D
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328973
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328972
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332916
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332917
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Standard Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328974
  • 财政年份:
    2024
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Continuing Grant
Study of the Particle Acceleration and Transport in PWN through X-ray Spectro-polarimetry and GeV Gamma-ray Observtions
通过 X 射线光谱偏振法和 GeV 伽马射线观测研究 PWN 中的粒子加速和输运
  • 批准号:
    23H01186
  • 财政年份:
    2023
  • 资助金额:
    $ 34.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了