Exploiting High Performance Computing to Provide Functional Annotations via CATH-Gene3D

利用高性能计算通过 CATH-Gene3D 提供功能注释

基本信息

  • 批准号:
    BB/H02364X/1
  • 负责人:
  • 金额:
    $ 13.88万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2010
  • 资助国家:
    英国
  • 起止时间:
    2010 至 无数据
  • 项目状态:
    已结题

项目摘要

Over the last ten years there have been intense efforts to determine the protein compositions of different organisms, including human and other model organisms from all kingdoms of life. Currently more than 1,000 organisms have been completely sequenced and nearly 10 million protein sequences determined. In 2000 the human genome was completed and the latest estimates say it contains between 23,000 and 25,000 protein-coding genes. It is difficult, expensive and time-consuming to determine the functional properties of all these proteins and for many organisms, including human, fewer than 15% of the proteins have been directly experimentally characterised to determine their function. Therefore, a major activity and challenge for bioinformatics groups has been the need to devise computational methods for inferring the functions of proteins. Most predictive methods exploit the premise that proteins in different species are related to each other (homologues) as they have evolved from a common ancestral protein. These homologous proteins frequently share similar functional properties, conserved during evolution. Therefore, many methods search for similarities in the sequences of proteins, indicative of an evolutionary relationship, which then allows functional information to be inherited. In other words, a protein that has been experimentally characterised in fly, for example, can be used to assign functional properties to an evolutionary related protein identified in human. The main challenge faced by these approaches is the fact that gene duplication occurs in all organisms throughout evolution. Therefore, as well as the original copy of a protein, derived from an ancestral protein, there can be additional copies which may have evolved slightly modified functions to expand the functional repertoire of the organism, thereby enhancing its survival. We have developed a resource (CATH-Gene3D) which groups proteins into evolutionary families on the basis of similarities in their 3D structures (where available) and their sequences. Currently, more than 2,200 families are classified in CATH-Gene3D accounting for the majority of protein domain sequences. Some of these families contain very many sequences as the proteins have been highly duplicated in organisms. These families pose a challenge to function prediction methods as the functions of the relatives have frequently diverged. We have designed a new method (GeMMA) which uses a sophisticated approach for comparing sets of evolutionary sequences to group them into subfamilies of proteins, which are very likely to share functional properties. Whilst GeMMA has been shown to be accurate in transferring functional information between relatives it can take a long time to run for the very large families in CATH-Gene3D. Therefore, to speed it up, this project will modify the GeMMA protocol so that we can run it on a wide range of publicly available HPC resources. We will also develop highly intuitive web pages to make the information provided by the GeMMA subfamilies very accessible for the biology community. This web site will also allow biologists to submit a query protein of unknown function which will then be searched against the GeMMA subfamilies to predict a putative function. CATH-Gene3D is already widely used by biologists and this new functional sub-classification will make the resource even more valuable to these researchers by providing more precise functional annotations for the novel proteins they are studying.
在过去的十年里,在确定不同生物体的蛋白质组成方面进行了密集的努力,包括来自所有生命王国的人类和其他模式生物体。目前,已对1000多种生物进行了完整测序,并确定了近1000万个蛋白质序列。2000年,人类基因组完成,最新的估计表明,它包含23,000到25,000个蛋白质编码基因。要确定所有这些蛋白质的功能特性是困难、昂贵和耗时的,对于包括人类在内的许多生物来说,只有不到15%的蛋白质被直接实验表征以确定其功能。因此,生物信息学小组的一个主要活动和挑战是需要设计出推断蛋白质功能的计算方法。大多数预测方法利用的前提是不同物种中的蛋白质彼此相关(同源),因为它们是从共同的祖先蛋白质进化而来的。这些同源蛋白经常具有相似的功能特性,在进化过程中保持保守。因此,许多方法在蛋白质序列中寻找相似性,以指示一种进化关系,然后允许功能信息被遗传。换句话说,例如,一种在苍蝇身上实验确定的蛋白质可以用来赋予在人类中识别的进化相关蛋白质的功能特性。这些方法面临的主要挑战是,在整个进化过程中,基因复制发生在所有生物体中。因此,除了源自祖先蛋白质的蛋白质的原始拷贝外,还可以有额外的拷贝,这些拷贝可能已经进化出略微改变的功能,以扩大有机体的功能,从而提高其存活率。我们已经开发了一个资源(Cath-Gene3D),它根据蛋白质的3D结构(如果有)及其序列的相似性将蛋白质分组为进化家族。目前,在Cath-Gene3D中分类的蛋白质家族超过2200个,占蛋白质结构域序列的大部分。其中一些家族包含非常多的序列,因为蛋白质在生物体中高度复制。这些家族对功能预测方法提出了挑战,因为亲属的功能经常不同。我们设计了一种新的方法(GEMA),它使用一种复杂的方法来比较进化序列集,将它们分组为蛋白质亚家族,这些蛋白质很可能具有共同的功能特性。虽然Gema已被证明在亲属之间传递功能信息是准确的,但对于Cath-Gene3D中的非常大的家庭来说,它可能需要很长时间才能运行。因此,为了加快速度,本项目将修改Gema协议,以便我们可以在广泛的公开可用的HPC资源上运行它。我们还将开发非常直观的网页,使生物界能够很容易地获取由杰玛子家族提供的信息。该网站还将允许生物学家提交未知功能的查询蛋白质,然后根据GEMA亚家族进行搜索,以预测推测的功能。Cath-Gene3D已经被生物学家广泛使用,这种新的功能分类将通过为他们正在研究的新蛋白质提供更准确的功能注释,使资源对这些研究人员更有价值。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.
  • DOI:
    10.1093/nar/gkp1049
  • 发表时间:
    2010-01
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Lee DA;Rentzsch R;Orengo C
  • 通讯作者:
    Orengo C
A large-scale evaluation of computational protein function prediction.
计算蛋白质功能预测的大规模评估
  • DOI:
    10.1038/nmeth.2340
  • 发表时间:
    2013-03
  • 期刊:
  • 影响因子:
    48
  • 作者:
    Radivojac, Predrag;Clark, Wyatt T.;Oron, Tal Ronnen;Schnoes, Alexandra M.;Wittkop, Tobias;Sokolov, Artem;Graim, Kiley;Funk, Christopher;Verspoor, Karin;Ben-Hur, Asa;Pandey, Gaurav;Yunes, Jeffrey M.;Talwalkar, Ameet S.;Repo, Susanna;Souza, Michael L.;Piovesan, Damiano;Casadio, Rita;Wang, Zheng;Cheng, Jianlin;Fang, Hai;Goughl, Julian;Koskinen, Patrik;Toronen, Petri;Nokso-Koivisto, Jussi;Holm, Liisa;Cozzetto, Domenico;Buchan, Daniel W. A.;Bryson, Kevin;Jones, David T.;Limaye, Bhakti;Inamdar, Harshal;Datta, Avik;Manjari, Sunitha K.;Joshi, Rajendra;Chitale, Meghana;Kihara, Daisuke;Lisewski, Andreas M.;Erdin, Serkan;Venner, Eric;Lichtarge, Olivier;Rentzsch, Robert;Yang, Haixuan;Romero, Alfonso E.;Bhat, Prajwal;Paccanaro, Alberto;Hamp, Tobias;Kassner, Rebecca;Seemayer, Stefan;Vicedo, Esmeralda;Schaefer, Christian;Achten, Dominik;Auer, Florian;Boehm, Ariane;Braun, Tatjana;Hecht, Maximilian;Heron, Mark;Hoenigschmid, Peter;Hopf, Thomas A.;Kaufmann, Stefanie;Kiening, Michael;Krompass, Denis;Landerer, Cedric;Mahlich, Yannick;Roos, Manfred;Bjorne, Jari;Salakoski, Tapio;Wong, Andrew;Shatkay, Hagit;Gatzmann, Fanny;Sommer, Ingolf;Wass, Mark N.;Sternberg, Michael J. E.;Skunca, Nives;Supek, Fran;Bosnjak, Matko;Panov, Pance;Dzeroski, Saso;Smuc, Tomislav;Kourmpetis, Yiannis A. I.;van Dijk, Aalt D. J.;ter Braak, Cajo J. F.;Zhou, Yuanpeng;Gong, Qingtian;Dong, Xinran;Tian, Weidong;Falda, Marco;Fontana, Paolo;Lavezzo, Enrico;Di Camillo, Barbara;Toppo, Stefano;Lan, Liang;Djuric, Nemanja;Guo, Yuhong;Vucetic, Slobodan;Bairoch, Amos;Linial, Michal;Babbitt, Patricia C.;Brenner, Steven E.;Orengo, Christine;Rost, Burkhard;Mooney, Sean D.;Friedberg, Iddo
  • 通讯作者:
    Friedberg, Iddo
Protein function prediction using domain families.
  • DOI:
    10.1186/1471-2105-14-s3-s5
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Rentzsch R;Orengo CA
  • 通讯作者:
    Orengo CA
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Christine Orengo其他文献

Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database
利用CATH数据库中的蛋白质结构域和功能家族来理解ATP-PP酶的结构与功能多样性
  • DOI:
    10.1016/j.str.2024.12.016
  • 发表时间:
    2025-03-06
  • 期刊:
  • 影响因子:
    4.300
  • 作者:
    Jialin Yin;Vaishali P. Waman;Neeladri Sen;Mohd Firdaus-Raih;Su Datt Lam;Christine Orengo
  • 通讯作者:
    Christine Orengo
Progress towards mapping the universe of protein folds
  • DOI:
    10.1186/gb-2004-5-5-107
  • 发表时间:
    2004-01-01
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Alastair Grant;David Lee;Christine Orengo
  • 通讯作者:
    Christine Orengo
Predicting protein function from sequence and structure
从序列和结构预测蛋白质功能
  • DOI:
    10.1038/nrm2281
  • 发表时间:
    2007-12-01
  • 期刊:
  • 影响因子:
    90.200
  • 作者:
    David Lee;Oliver Redfern;Christine Orengo
  • 通讯作者:
    Christine Orengo
Globalization : Approaches to Diversities
全球化:实现多元化的途径
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Benoit H Dessailly;Natalie L Dawson;Kenji Mizuguchi;Christine Orengo;Hector Cuadra-Montiel
  • 通讯作者:
    Hector Cuadra-Montiel

Christine Orengo的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Christine Orengo', 18)}}的其他基金

BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
  • 批准号:
    BB/Y001117/1
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
ProtFunAI: AI based methods for functional annotation of proteins in crop genomes
ProtFunAI:基于人工智能的作物基因组蛋白质功能注释方法
  • 批准号:
    BB/Y514044/1
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
  • 批准号:
    BB/X018563/1
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
Transforming the Structural Landscape of CATH to Aid Variant Analyses in Human and Agricultural Organisms and their Pathogens
改变 CATH 的结构景观以帮助人类和农业生物体及其病原体的变异分析
  • 批准号:
    BB/W018802/1
  • 财政年份:
    2022
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
释放植物的化学潜力:根据复杂酶超家族的 DNA 序列预测功能
  • 批准号:
    BB/V014722/1
  • 财政年份:
    2022
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
CATH-FunVar - Predicting Viral and Human Variants Affecting COVID-19 Susceptibility and Severity and Repurposing Therapeutics
CATH-FunVar - 预测影响 COVID-19 易感性和严重程度的病毒和人类变异并重新调整治疗用途
  • 批准号:
    BB/W003368/1
  • 财政年份:
    2021
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
3D-Gateway - Gateway to protein structure and function
3D-Gateway - 蛋白质结构和功能的门户
  • 批准号:
    BB/S020144/1
  • 财政年份:
    2020
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
  • 批准号:
    BB/S020039/1
  • 财政年份:
    2020
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation
SENSE - 使用信息学目标选择和高通量验证筛选环境序列以发现新的蛋白质功能
  • 批准号:
    BB/T002735/1
  • 财政年份:
    2020
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
BBSRC-NSF/BIO Expanding the fold library in the twilight zone to facilitate structure determination of macromolecular machines
BBSRC-NSF/BIO 扩展暮光区折叠库以促进大分子机器的结构测定
  • 批准号:
    BB/S016007/1
  • 财政年份:
    2020
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant

相似海外基金

Planning: Artificial Intelligence Assisted High-Performance Parallel Computing for Power System Optimization
规划:人工智能辅助高性能并行计算电力系统优化
  • 批准号:
    2414141
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
The Kelvin Living Lab: Towards Net Zero High-Performance Computing
开尔文生活实验室:迈向净零高性能计算
  • 批准号:
    EP/Z531054/1
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
CC* CIRA: High-performance computing solutions for small Midwest institutions
CC* CIRA:面向中西部小型机构的高性能计算解决方案
  • 批准号:
    2346616
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
MRI: Track 1 Acquisition of a High-Performance Computing System at New Mexico Tech
MRI:新墨西哥理工学院高性能计算系统的第一轨道采购
  • 批准号:
    2320162
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403399
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
Malleability in resource allocation for improved system efficiency in high-performance computing
资源分配的可塑性可提高高性能计算的系统效率
  • 批准号:
    EP/Y53061X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Research Grant
Equipment: CC* Campus Compute: A High-Performance Computing System for Research and Education in Arkansas
设备:CC* 校园计算:用于阿肯色州研究和教育的高性能计算系统
  • 批准号:
    2346752
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC: Core: Harvesting Idle Resources Safely and Timely for Large-scale AI Applications in High-Performance Computing Systems
合作研究:OAC:核心:安全及时地收集闲置资源,用于高性能计算系统中的大规模人工智能应用
  • 批准号:
    2403398
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
REU Site: High Performance Computing (HPC) Tools, Techniques, and Research across the Physical Sciences
REU 网站:跨物理科学领域的高性能计算 (HPC) 工具、技术和研究
  • 批准号:
    2348782
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
REU Site: Research on Computational Methods in High Performance Computing and Their Applications to Computational Sciences
REU 网站:高性能计算中的计算方法及其在计算科学中的应用研究
  • 批准号:
    2348884
  • 财政年份:
    2024
  • 资助金额:
    $ 13.88万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了