权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation

提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性

基本信息

批准号：
BB/R014892/1
负责人：
Christine Orengo
金额：
$ 79.16万
依托单位：
University College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2018
资助国家：
英国
起止时间：
2018 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=BB%2FR014892%2F1
关键词：
Increasing Coverage Accuracy CATH Comparative

项目摘要

Evolution has given rise to families of protein domains where relatives are linked through speciation events or duplication events in the same genome. Extensive domain duplication and shuffling gives multi-domain proteins with varying functions depending on the domain composition.The CATH classification takes the domain as the primary evolutionary unit and classifies relatives having significantly similar structures and sequence patterns. Currently there are 5500 CATH superfamilies containing 93 million domains. Previous funding allowed us to hugely increase the number of domains in CATH. We want to keep increasing this data - even bigger expansions are expected as new technologies make it easier to solve structures and capture sequence data. We will improve the accuracy of our domain data by working with other classification experts (Alexey Murzin of SCOP) to establish a shared domain recognition platform for new domains at the European Bioinformatics Institute, with difficult assignments jointly validated by CATH/SCOP experts. This data will be public and valuable for other resources (eg SCOPe, ECOD).CATH has been established for 22 years and is renowned for providing accurate structural annotations for biological analyses. More recently it significantly increased its value to the biology community by providing functional predictions. Although the structural core of the superfamily is highly conserved, variations away from the core cause changes in function. CATH addresses this by grouping evolutionary relatives likely to have highly similar functions and structures into functional families (FunFams). Thus FunFams can accurately inherit information about structures and functions, between relatives. This is important as <10% of domains have been experimentally characterised. We verified in-silico that FunFams can accurately model structures of uncharacterised relatives and the ability of FunFams to inherit functional information between relatives has been validated by an international competition - CAFA. We will make the FunFams much more comprehensive and increase the accuracy of FunFams for enzymes.Extending our FunFam library will allow us to predict more accurate multi-domain annotations in genome sequences. This will help biologists comparing the genomes of organisms occupying different environmental niches, as identification of diverse domain combinations can hint at changes in the functional repertoires of the organisms and different abilities to exploit compounds in their environments.Because relatives in FunFams are so structurally conserved we can align and superpose them to extract the characteristics of this conserved structural core and use this information to build a '3D core-template'. These templates will help solve the structures of many more relatives since powerful new structural biology techniques (eg cryo-EM) can use core libraries like these to model the structures of uncharacterised proteins from electron dispersion data.In another exciting development for CATH we will harness the structural data and the additional power that comes from 200-fold greater sequence data to find residue sites in the protein, conserved throughout evolution for their functional importance. We will characterise these sites. We already predict functional sites well from conservation patterns in sequence data, but including structural data can help distinguish the type of site (eg site binding a compound or another protein) and identify additional residues involved in the functional mechanism. This data is valuable for protein design and understanding why mutations near these sites affect the protein and cause disease.We will disseminate our data via webpages and other web mechanisms and develop e-videos and training material for the new features. We'll also build more efficient mechanisms for scanning our website and for biologists to install our tools on their own computers to analyse genome data.

项目成果

期刊论文数量（7）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.

DOI：
10.1038/s42003-023-04488-9
发表时间：
2023-02-08
期刊：
Communications biology
影响因子：
5.9
作者：
通讯作者：

KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units.

Kinfams：使用CATH功能单元对蛋白激酶进行脱离蛋白质激酶的分类。

DOI：
10.3390/biom13020277
发表时间：
2023-02-02
期刊：
Biomolecules
影响因子：
5.5
作者：
通讯作者：

Protein structure and function analyses to understand the implication of mutually exclusive splicing

蛋白质结构和功能分析以了解互斥剪接的含义

DOI：
10.1101/292813
发表时间：
2018
期刊：
影响因子：
0
作者：
Lam S
通讯作者：
Lam S

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals.

DOI：
10.1038/s41598-020-71936-5
发表时间：
2020-10-05
期刊：
Scientific reports
影响因子：
4.6
作者：
Lam SD;Bordin N;Waman VP;Scholes HM;Ashford P;Sen N;van Dorp L;Rauer C;Dawson NL;Pang CSM;Abbasian M;Sillitoe I;Edwards SJL;Fraternali F;Lees JG;Santini JM;Orengo CA
通讯作者：
Orengo CA

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Christine Orengo其他文献

Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database

利用CATH数据库中的蛋白质结构域和功能家族来理解ATP-PP酶的结构与功能多样性

DOI：
10.1016/j.str.2024.12.016
发表时间：
2025-03-06
期刊：
STRUCTURE
影响因子：
4.300
作者：
Jialin Yin;Vaishali P. Waman;Neeladri Sen;Mohd Firdaus-Raih;Su Datt Lam;Christine Orengo
通讯作者：
Christine Orengo