18-BBSRC-NSF/BIO : CIBR:Implementing an explicit phylogenetic framework for large-scale protein sequence annotation
18-BBSRC-NSF/BIO:CIBR:为大规模蛋白质序列注释实施明确的系统发育框架
基本信息
- 批准号:BB/T010541/1
- 负责人:
- 金额:$ 51.25万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2020
- 资助国家:英国
- 起止时间:2020 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Proteins are the primary molecular machines that perform the instructions encoded in our genomes. Proteins ultimately shape the response of our cells, tissues, organs, and bodies to the surrounding environment, either directly (e.g. muscle contraction) or through their functional outputs (e.g. the electrical signals along the dendrites to produce a nerve impulse or action potential). Therefore, understanding the functional role(s) performed by each protein is critical to research and development in many areas of science, particularly biology, medicine and applied biotechnology. The rapid increase in throughput of next-generation sequencing technologies has important ramifications, in that our ability to sequence an organism's genome and determine the proteins it encodes far out paces our ability to experimentally characterise the function of a protein. Thus, for every functionally characterised protein, there are now many thousands of proteins that will never be experimentally characterised. Molecular biology increasingly relies on our ability to computationally group related sequences and to transfer functional annotations from the few experimentally characterised proteins, to those related, yet uncharacterised, proteins. Knowledge on proteins has been collected and stored in public databases like UniProt, a world-leading resource on protein sequences and function. Currently, there are over 150 million sequences in UniProt, with the number doubling every two years. Therefore, it is crucial to develop new and reliable computational methods for inferring protein function that can be scaled to billions of sequences. We aim to implement an annotation system that incorporates evolutionary information, permitting the level of annotation transfer to be tuned accordingly, while also ensuring scalability and speed of annotation that meets current and future demands. This new annotation system will integrate the most innovative features present in two pre-existing methods that are currently used in producing world-class resources. The Gene Ontology (GO) Consortium has developed software for explicit evolutionary modelling of GO annotation gain and loss along specific branches of phylogenetic trees, and has applied it to inferring GO annotations for experimentally uncharacterised proteins. UniProt has developed the UniRule system that applies annotation "rules" that combines information on protein families and domains (from the InterPro resource), with a range of other types of information like taxonomy, to make more precise and informative annotations. Our goal is to create a next-generation, large-scale annotation system that merges the two approaches, and to implement this annotation system in the UniProt resource, thereby increasing the quality of functional annotations in the database for the benefit of the scientific community. We propose three specific aims to achieve this goal: (1) convert existing UniRule rules into explicit evolutionary models, (2) integrate software to apply the evolutionary models (TreeGrafter) into the UniProt annotation pipeline, and (3) develop software for ongoing curation of new evolutionary models of additional annotation types and protein families. The result will be an annotation pipeline based on explicit evolutionary principles, which will enable seamless sharing of information between the UniProt and GO curation processes, and substantially improve the accuracy, comprehensiveness and informativeness of inferred protein annotations in public databases.
蛋白质是执行我们基因组中编码的指令的主要分子机器。蛋白质最终塑造了我们的细胞、组织、器官和身体对周围环境的反应,无论是直接的(例如肌肉收缩)还是通过它们的功能输出(例如电信号沿着树突产生神经冲动或动作电位)。因此,了解每种蛋白质的功能作用对于许多科学领域的研究和开发至关重要,特别是生物学,医学和应用生物技术。下一代测序技术通量的快速增长具有重要的影响,因为我们对生物体基因组进行测序并确定其编码的蛋白质的能力远远超过了我们通过实验验证蛋白质功能的能力。因此,对于每一个功能特征的蛋白质,现在有成千上万的蛋白质将永远不会被实验表征。分子生物学越来越依赖于我们的能力,计算组相关的序列和转移功能注释从少数实验表征的蛋白质,这些相关的,但未表征的蛋白质。有关蛋白质的知识已被收集并存储在公共数据库中,如UniProt,这是世界领先的蛋白质序列和功能资源。目前,UniProt中有超过1.5亿个序列,每两年增加一倍。因此,开发新的可靠的计算方法来推断可以扩展到数十亿序列的蛋白质功能至关重要。我们的目标是实现一个注释系统,结合进化信息,允许相应地调整注释传输的水平,同时还确保注释的可扩展性和速度,以满足当前和未来的需求。这个新的注释系统将整合目前用于生产世界级资源的两种现有方法中最具创新性的功能。基因本体论(GO)联盟已经开发了明确的进化建模软件的GO注释增益和损失沿着特定分支的系统发育树,并已将其应用于推断GO注释的实验未表征的蛋白质。UniProt开发了UniRule系统,该系统应用注释“规则”,将蛋白质家族和结构域的信息(来自InterPro资源)与一系列其他类型的信息(如分类学)结合起来,以进行更精确和信息丰富的注释。我们的目标是创建一个下一代的,大规模的注释系统,合并这两种方法,并实现此注释系统的UniProt资源,从而提高数据库中的功能注释的质量,为科学界的利益。我们提出了三个具体目标来实现这一目标:(1)将现有的UniRule规则转换为显式进化模型,(2)集成软件以将进化模型(TreeGrafter)应用到UniProt注释管道中,以及(3)开发用于持续管理其他注释类型和蛋白质家族的新进化模型的软件。其结果将是一个基于明确进化原理的注释管道,这将使UniProt和GO策展过程之间的信息无缝共享,并大大提高公共数据库中推断蛋白质注释的准确性,全面性和信息量。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.
- DOI:10.3390/metabo11010048
- 发表时间:2021-01-12
- 期刊:
- 影响因子:4.1
- 作者:Feuermann M;Boutet E;Morgat A;Axelsen KB;Bansal P;Bolleman J;de Castro E;Coudert E;Gasteiger E;Géhant S;Lieberherr D;Lombardot T;Neto TB;Pedruzzi I;Poux S;Pozzato M;Redaschi N;Bridge A;On Behalf Of The UniProt Consortium
- 通讯作者:On Behalf Of The UniProt Consortium
UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.
- DOI:10.1093/bioinformatics/btaa485
- 发表时间:2020-11-01
- 期刊:
- 影响因子:0
- 作者:MacDougall A;Volynkin V;Saidi R;Poggioli D;Zellner H;Hatton-Ellis E;Joshi V;O'Donovan C;Orchard S;Auchincloss AH;Baratin D;Bolleman J;Coudert E;de Castro E;Hulo C;Masson P;Pedruzzi I;Rivoire C;Arighi C;Wang Q;Chen C;Huang H;Garavelli J;Vinayaka CR;Yeh LS;Natale DA;Laiho K;Martin MJ;Renaux A;Pichler K;UniProt Consortium
- 通讯作者:UniProt Consortium
The InterPro protein families and domains database: 20 years on.
- DOI:10.1093/nar/gkaa977
- 发表时间:2021-01-08
- 期刊:
- 影响因子:14.9
- 作者:Blum M;Chang HY;Chuguransky S;Grego T;Kandasaamy S;Mitchell A;Nuka G;Paysan-Lafosse T;Qureshi M;Raj S;Richardson L;Salazar GA;Williams L;Bork P;Bridge A;Gough J;Haft DH;Letunic I;Marchler-Bauer A;Mi H;Natale DA;Necci M;Orengo CA;Pandurangan AP;Rivoire C;Sigrist CJA;Sillitoe I;Thanki N;Thomas PD;Tosatto SCE;Wu CH;Bateman A;Finn RD
- 通讯作者:Finn RD
UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.
- DOI:10.1016/j.mcpro.2023.100591
- 发表时间:2023-08
- 期刊:
- 影响因子:7
- 作者:Bowler-Barnett, E. H.;Fan, J.;Luo, J.;Magrane, M.;Martin, M. J.;Orchard, S.
- 通讯作者:Orchard, S.
Searching and Navigating UniProt Databases.
搜索和导航 UniProt 数据库。
- DOI:10.1002/cpz1.700
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Lussi,YvonneC;Magrane,Michele;Martin,MariaJ;Orchard,Sandra;UniProtConsortium
- 通讯作者:UniProtConsortium
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Maria J. Martin其他文献
Processes and outcomes in student teamwork. An empirical study in a marketing subject
学生团队合作的过程和结果。
- DOI:
10.1080/03075079.2014.926319 - 发表时间:
2016 - 期刊:
- 影响因子:4.2
- 作者:
Rafael Bravo;L. Lucia;Maria J. Martin - 通讯作者:
Maria J. Martin
Applying dynamic balancing to improve the performance of MPI parallel genomics applications
应用动态平衡来提高 MPI 并行基因组学应用的性能
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Alejandro Fernández;J. González;Maria J. Martin - 通讯作者:
Maria J. Martin
Quantitative and qualitative methods of evaluating response to biologics in severe asthma patients: Results from a real-world study
- DOI:
10.1016/j.jaip.2022.11.009 - 发表时间:
2023-03-01 - 期刊:
- 影响因子:
- 作者:
Miguel Estravís;Jacqueline Pérez-Pazos;Maria J. Martin;Jacinto Ramos-González;María Gil-Melcón;Cristina Martín-García;Asunción García-Sánchez;Catalina Sanz;Ignacio Dávila - 通讯作者:
Ignacio Dávila
Maria J. Martin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y000455/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO Building synthetic regulatory units to understand the complexity of mammalian gene expression
22-BBSRC/NSF-BIO 构建合成调控单元以了解哺乳动物基因表达的复杂性
- 批准号:
BB/Y008898/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
20-BBSRC/NSF-BIO Regulatory control of innate immune response in marine invertebrates
20-BBSRC/NSF-BIO 海洋无脊椎动物先天免疫反应的调节控制
- 批准号:
BB/W017865/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO - Interpretable & Noise-robust Machine Learning for Neurophysiology
22-BBSRC/NSF-BIO - 可解释
- 批准号:
BB/Y008758/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO: Community-dependent CRISPR-cas evolution and robust community function
22-BBSRC/NSF-BIO:群落依赖性 CRISPR-cas 进化和强大的群落功能
- 批准号:
BB/Y008774/1 - 财政年份:2024
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
UKRI/BBSRC-NSF/BIO: Interpretable and Noise-Robust Machine Learning for Neurophysiology
UKRI/BBSRC-NSF/BIO:用于神经生理学的可解释且抗噪声的机器学习
- 批准号:
2321840 - 财政年份:2023
- 资助金额:
$ 51.25万 - 项目类别:
Continuing Grant
UKRI/BBSRC-NSF/BIO:Hidden costs of infection: mechanisms by which parasites disrupt host-microbe symbioses and alter development
UKRI/BBSRC-NSF/BIO:感染的隐性成本:寄生虫破坏宿主-微生物共生并改变发育的机制
- 批准号:
2322173 - 财政年份:2023
- 资助金额:
$ 51.25万 - 项目类别:
Continuing Grant
21-BBSRC/NSF-BIO: Developing large serine integrases as tools for constructing and manipulating synthetic replicons.
21-BBSRC/NSF-BIO:开发大型丝氨酸整合酶作为构建和操作合成复制子的工具。
- 批准号:
BB/X012085/1 - 财政年份:2023
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant
UKRI/BBSRC-NSF/BIO Determining the Roles of Fusarium Effector Proteases in Plant Pathogenesis
UKRI/BBSRC-NSF/BIO 确定镰刀菌效应蛋白酶在植物发病机制中的作用
- 批准号:
BB/X012131/1 - 财政年份:2023
- 资助金额:
$ 51.25万 - 项目类别:
Research Grant