Increasing the Coverage and Accuracy of CATH for Comparative Genomics and Variant Interpretation
提高比较基因组学和变异解释的 CATH 的覆盖范围和准确性
基本信息
- 批准号:BB/R014892/1
- 负责人:
- 金额:$ 79.16万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2018
- 资助国家:英国
- 起止时间:2018 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Evolution has given rise to families of protein domains where relatives are linked through speciation events or duplication events in the same genome. Extensive domain duplication and shuffling gives multi-domain proteins with varying functions depending on the domain composition.The CATH classification takes the domain as the primary evolutionary unit and classifies relatives having significantly similar structures and sequence patterns. Currently there are 5500 CATH superfamilies containing 93 million domains. Previous funding allowed us to hugely increase the number of domains in CATH. We want to keep increasing this data - even bigger expansions are expected as new technologies make it easier to solve structures and capture sequence data. We will improve the accuracy of our domain data by working with other classification experts (Alexey Murzin of SCOP) to establish a shared domain recognition platform for new domains at the European Bioinformatics Institute, with difficult assignments jointly validated by CATH/SCOP experts. This data will be public and valuable for other resources (eg SCOPe, ECOD).CATH has been established for 22 years and is renowned for providing accurate structural annotations for biological analyses. More recently it significantly increased its value to the biology community by providing functional predictions. Although the structural core of the superfamily is highly conserved, variations away from the core cause changes in function. CATH addresses this by grouping evolutionary relatives likely to have highly similar functions and structures into functional families (FunFams). Thus FunFams can accurately inherit information about structures and functions, between relatives. This is important as <10% of domains have been experimentally characterised. We verified in-silico that FunFams can accurately model structures of uncharacterised relatives and the ability of FunFams to inherit functional information between relatives has been validated by an international competition - CAFA. We will make the FunFams much more comprehensive and increase the accuracy of FunFams for enzymes.Extending our FunFam library will allow us to predict more accurate multi-domain annotations in genome sequences. This will help biologists comparing the genomes of organisms occupying different environmental niches, as identification of diverse domain combinations can hint at changes in the functional repertoires of the organisms and different abilities to exploit compounds in their environments.Because relatives in FunFams are so structurally conserved we can align and superpose them to extract the characteristics of this conserved structural core and use this information to build a '3D core-template'. These templates will help solve the structures of many more relatives since powerful new structural biology techniques (eg cryo-EM) can use core libraries like these to model the structures of uncharacterised proteins from electron dispersion data.In another exciting development for CATH we will harness the structural data and the additional power that comes from 200-fold greater sequence data to find residue sites in the protein, conserved throughout evolution for their functional importance. We will characterise these sites. We already predict functional sites well from conservation patterns in sequence data, but including structural data can help distinguish the type of site (eg site binding a compound or another protein) and identify additional residues involved in the functional mechanism. This data is valuable for protein design and understanding why mutations near these sites affect the protein and cause disease.We will disseminate our data via webpages and other web mechanisms and develop e-videos and training material for the new features. We'll also build more efficient mechanisms for scanning our website and for biologists to install our tools on their own computers to analyse genome data.
Evolution has given rise to families of protein domains where relatives are linked through speciation events or duplication events in the same genome. Extensive domain duplication and shuffling gives multi-domain proteins with varying functions depending on the domain composition.The CATH classification takes the domain as the primary evolutionary unit and classifies relatives having significantly similar structures and sequence patterns. Currently there are 5500 CATH superfamilies containing 93 million domains. Previous funding allowed us to hugely increase the number of domains in CATH. We want to keep increasing this data - even bigger expansions are expected as new technologies make it easier to solve structures and capture sequence data. We will improve the accuracy of our domain data by working with other classification experts (Alexey Murzin of SCOP) to establish a shared domain recognition platform for new domains at the European Bioinformatics Institute, with difficult assignments jointly validated by CATH/SCOP experts. This data will be public and valuable for other resources (eg SCOPe, ECOD).CATH has been established for 22 years and is renowned for providing accurate structural annotations for biological analyses. More recently it significantly increased its value to the biology community by providing functional predictions. Although the structural core of the superfamily is highly conserved, variations away from the core cause changes in function. CATH addresses this by grouping evolutionary relatives likely to have highly similar functions and structures into functional families (FunFams). Thus FunFams can accurately inherit information about structures and functions, between relatives. This is important as <10% of domains have been experimentally characterised. We verified in-silico that FunFams can accurately model structures of uncharacterised relatives and the ability of FunFams to inherit functional information between relatives has been validated by an international competition - CAFA. We will make the FunFams much more comprehensive and increase the accuracy of FunFams for enzymes.Extending our FunFam library will allow us to predict more accurate multi-domain annotations in genome sequences. This will help biologists comparing the genomes of organisms occupying different environmental niches, as identification of diverse domain combinations can hint at changes in the functional repertoires of the organisms and different abilities to exploit compounds in their environments.Because relatives in FunFams are so structurally conserved we can align and superpose them to extract the characteristics of this conserved structural core and use this information to build a '3D core-template'. These templates will help solve the structures of many more relatives since powerful new structural biology techniques (eg cryo-EM) can use core libraries like these to model the structures of uncharacterised proteins from electron dispersion data.In another exciting development for CATH we will harness the structural data and the additional power that comes from 200-fold greater sequence data to find residue sites in the protein, conserved throughout evolution for their functional importance. We will characterise these sites. We already predict functional sites well from conservation patterns in sequence data, but including structural data can help distinguish the type of site (eg site binding a compound or another protein) and identify additional residues involved in the functional mechanism. This data is valuable for protein design and understanding why mutations near these sites affect the protein and cause disease.We will disseminate our data via webpages and other web mechanisms and develop e-videos and training material for the new features. We'll also build more efficient mechanisms for scanning our website and for biologists to install our tools on their own computers to analyse genome data.
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.
- DOI:10.1038/s42003-023-04488-9
- 发表时间:2023-02-08
- 期刊:
- 影响因子:5.9
- 作者:
- 通讯作者:
KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units.
Kinfams:使用CATH功能单元对蛋白激酶进行脱离蛋白质激酶的分类。
- DOI:10.3390/biom13020277
- 发表时间:2023-02-02
- 期刊:
- 影响因子:5.5
- 作者:
- 通讯作者:
Protein structure and function analyses to understand the implication of mutually exclusive splicing
蛋白质结构和功能分析以了解互斥剪接的含义
- DOI:10.1101/292813
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Lam S
- 通讯作者:Lam S
SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals.
- DOI:10.1038/s41598-020-71936-5
- 发表时间:2020-10-05
- 期刊:
- 影响因子:4.6
- 作者:Lam SD;Bordin N;Waman VP;Scholes HM;Ashford P;Sen N;van Dorp L;Rauer C;Dawson NL;Pang CSM;Abbasian M;Sillitoe I;Edwards SJL;Fraternali F;Lees JG;Santini JM;Orengo CA
- 通讯作者:Orengo CA
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christine Orengo其他文献
Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database
利用CATH数据库中的蛋白质结构域和功能家族来理解ATP-PP酶的结构与功能多样性
- DOI:
10.1016/j.str.2024.12.016 - 发表时间:
2025-03-06 - 期刊:
- 影响因子:4.300
- 作者:
Jialin Yin;Vaishali P. Waman;Neeladri Sen;Mohd Firdaus-Raih;Su Datt Lam;Christine Orengo - 通讯作者:
Christine Orengo
Progress towards mapping the universe of protein folds
- DOI:
10.1186/gb-2004-5-5-107 - 发表时间:
2004-01-01 - 期刊:
- 影响因子:9.400
- 作者:
Alastair Grant;David Lee;Christine Orengo - 通讯作者:
Christine Orengo
Predicting protein function from sequence and structure
从序列和结构预测蛋白质功能
- DOI:
10.1038/nrm2281 - 发表时间:
2007-12-01 - 期刊:
- 影响因子:90.200
- 作者:
David Lee;Oliver Redfern;Christine Orengo - 通讯作者:
Christine Orengo
Globalization : Approaches to Diversities
全球化:实现多元化的途径
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Benoit H Dessailly;Natalie L Dawson;Kenji Mizuguchi;Christine Orengo;Hector Cuadra-Montiel - 通讯作者:
Hector Cuadra-Montiel
Christine Orengo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christine Orengo', 18)}}的其他基金
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
ProtFunAI: AI based methods for functional annotation of proteins in crop genomes
ProtFunAI:基于人工智能的作物基因组蛋白质功能注释方法
- 批准号:
BB/Y514044/1 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
- 批准号:
BB/X018563/1 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Transforming the Structural Landscape of CATH to Aid Variant Analyses in Human and Agricultural Organisms and their Pathogens
改变 CATH 的结构景观以帮助人类和农业生物体及其病原体的变异分析
- 批准号:
BB/W018802/1 - 财政年份:2022
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
释放植物的化学潜力:根据复杂酶超家族的 DNA 序列预测功能
- 批准号:
BB/V014722/1 - 财政年份:2022
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
CATH-FunVar - Predicting Viral and Human Variants Affecting COVID-19 Susceptibility and Severity and Repurposing Therapeutics
CATH-FunVar - 预测影响 COVID-19 易感性和严重程度的病毒和人类变异并重新调整治疗用途
- 批准号:
BB/W003368/1 - 财政年份:2021
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
3D-Gateway - Gateway to protein structure and function
3D-Gateway - 蛋白质结构和功能的门户
- 批准号:
BB/S020144/1 - 财政年份:2020
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
- 批准号:
BB/S020039/1 - 财政年份:2020
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
- 批准号:
BB/T002735/1 - 财政年份:2020
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
BBSRC-NSF/BIO Expanding the fold library in the twilight zone to facilitate structure determination of macromolecular machines
BBSRC-NSF/BIO 扩展暮光区折叠库以促进大分子机器的结构测定
- 批准号:
BB/S016007/1 - 财政年份:2020
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
相似海外基金
NSF Convergence Accelerator Track K: Unraveling the Benefits, Costs, and Equity of Tree Coverage in Desert Cities
NSF 融合加速器轨道 K:揭示沙漠城市树木覆盖的效益、成本和公平性
- 批准号:
2344472 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Standard Grant
A mobile health solution in combination with behavioral change approach to improve vaccination coverage and timeliness in Bangladesh: A cluster randomized control trial
移动健康解决方案与行为改变方法相结合,以提高孟加拉国的疫苗接种覆盖率和及时性:集群随机对照试验
- 批准号:
24K20168 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
FlexiFone: Improving Mobile Network Coverage across the UK
FlexiFone:改善英国移动网络覆盖
- 批准号:
10113842 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Collaborative R&D
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
- 批准号:
BB/X018563/1 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性
- 批准号:
BB/X018660/1 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Research Grant
Framing poverty: Japan's media coverage of poverty and its social implications on anti-poverty policy
贫困框架:日本媒体对贫困的报道及其对反贫困政策的社会影响
- 批准号:
24K05296 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Informing the design of policies aimed at increasing health financial protection to accelerate universal health coverage in West Africa
为旨在加强卫生财政保护的政策设计提供信息,以加速西非的全民健康覆盖
- 批准号:
23K24567 - 财政年份:2024
- 资助金额:
$ 79.16万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Collaborative Research: NSFGEO-NERC: MEZCAL: Methods for Extending the horiZontal Coverage of the Amoc Latitudinally and back in time (MEZCAL)
合作研究:NSFGEO-NERC:MEZCAL:扩展 Amoc 纬度和时间回水平覆盖范围的方法 (MEZCAL)
- 批准号:
2409764 - 财政年份:2023
- 资助金额:
$ 79.16万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data
合作研究:III:中:使用低覆盖率单细胞测序数据对肿瘤单倍型进行可扩展推理和系统动力学分析的算法
- 批准号:
2415562 - 财政年份:2023
- 资助金额:
$ 79.16万 - 项目类别:
Standard Grant
Impact of Medicaid Postpartum Coverage Extension and Mandated Postpartum Depression Screening on Care for Gestational Diabetes and Pregnancy-Induced Hypertension
医疗补助产后覆盖范围扩大和强制性产后抑郁症筛查对妊娠期糖尿病和妊娠高血压综合征护理的影响
- 批准号:
10749378 - 财政年份:2023
- 资助金额:
$ 79.16万 - 项目类别: