Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods

使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性

基本信息

  • 批准号:
    BB/X018660/1
  • 负责人:
  • 金额:
    $ 95.75万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2024
  • 资助国家:
    英国
  • 起止时间:
    2024 至 无数据
  • 项目状态:
    未结题

项目摘要

Proteins are macromolecules responsible for biological processes in the cell. At their most basic level, they consist of a sequence of amino acids, determined by the sequence of nucleotides (the ATGC building blocks of life) in a gene. Proteins usually fold into three-dimensional structures, allowing them to interact with other molecules and perform their functions. Recent advances in sequencing technologies have led to a substantial accumulation of protein data, and our capacity of generating new protein sequences has surpassed our ability to fully understand their functions. Therefore, it is crucial to develop computational methods that identify sequence or structural similarities between characterised and uncharacterised proteins to transfer functional information from the former to the latter.InterPro, Pfam and FunFam are world-leading, UK-based resources that group similar protein sequences together, forming protein families. Pfam is a collection of protein domain families containing functional annotations. FunFam focuses on protein structural domains that share a common function. InterPro merges information from 13 expert protein databases, including Pfam and FunFam, into a single searchable resource, and further annotates protein families.In the past few years, Artificial Intelligence methods have been successfully applied to several biological applications. For instance, DeepMind's AlphaFold has revolutionised the prediction of how protein sequences fold into three-dimensional structures. Several promising tools are being developed by our collaborators to better identify protein families using Deep Learning (DL). These methods outperform current state-of-the-art approaches in terms of accuracy, coverage and computing efficiency, thus making them more environmentally sustainable.In this ambitious project, we will improve the efficiency, accuracy, and sustainability of InterPro, Pfam and FunFam. This will be accomplished by reducing the technical debt of Pfam, established almost three decades ago, adopting DL approaches to enhance the classification of protein sequences into families, and significantly reducing the carbon footprint of sequence annotation. Finally, we will improve the annotation of agriculturally important plant pathogens, resulting in the creation of hundreds of additional InterPro and Pfam entries.
蛋白质是负责细胞内生物过程的大分子。在它们最基本的水平上,它们由氨基酸序列组成,由基因中的核苷酸序列(生命的ATGC构件)决定。蛋白质通常折叠成三维结构,使它们能够与其他分子相互作用并发挥其功能。测序技术的最新进展导致蛋白质数据的大量积累,我们产生新蛋白质序列的能力已经超过了我们充分了解它们功能的能力。因此,开发计算方法来识别特征蛋白质和未特征蛋白质之间的序列或结构相似性以将功能信息从前者传递到后者是至关重要的。InterPro、Pfam和FunFam是世界领先的英国资源,它们将相似的蛋白质序列组合在一起,形成蛋白质家族。Pfam是包含功能注释的蛋白质结构域家族的集合。FunFam专注于共享共同功能的蛋白质结构域。InterPro将包括Pfam和FunFam在内的13个专家蛋白质数据库中的信息合并为一个可搜索的资源,并进一步注释蛋白质家族。在过去的几年中,人工智能方法已成功地应用于几个生物学应用。例如,DeepMind的AlphaFold彻底改变了蛋白质序列如何折叠成三维结构的预测。我们的合作者正在开发几个有希望的工具,以使用深度学习(DL)更好地识别蛋白质家族。这些方法在准确性、覆盖率和计算效率方面都优于目前最先进的方法,从而使它们更具环境可持续性。在这个雄心勃勃的项目中,我们将提高InterPro、Pfam和FunFam的效率、准确性和可持续性。这将通过减少近30年前建立的Pfam的技术债务,采用DL方法来加强蛋白质序列到家族的分类,以及显著减少序列注释的碳足迹来实现。最后,我们将改进农业上重要的植物病原体的注释,从而创建数百个额外的InterPro和Pfam条目。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Alex Bateman其他文献

Bioinformatics Applications Note Databases and Ontologies Codex: Exploration of Semantic Changes between Ontology Versions
生物信息学应用笔记数据库和本体法典:本体版本之间语义变化的探索
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michael Hartung;Anika Groß;E. Rahm;Alex Bateman
  • 通讯作者:
    Alex Bateman
Bioinformatics Advance Access published May 31, 2007
生物信息学高级访问发表于 2007 年 5 月 31 日
  • DOI:
    10.1007/s10015-009-0735-5
  • 发表时间:
    2007
  • 期刊:
  • 影响因子:
    0.9
  • 作者:
    Alex Bateman
  • 通讯作者:
    Alex Bateman

Alex Bateman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Alex Bateman', 18)}}的其他基金

UKRI/BBSRC-NSF/BIO: Unifying Pfam protein sequence and ECOD structural classifications with structure models
UKRI/BBSRC-NSF/BIO:通过结构模型统一 Pfam 蛋白质序列和 ECOD 结构分类
  • 批准号:
    BB/X012492/1
  • 财政年份:
    2023
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam
利用数据驱动的计算方法来理解 InterPro 和 Pfam 中的蛋白质结构和功能
  • 批准号:
    BB/S020381/1
  • 财政年份:
    2019
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Rfam: The community resource for RNA families
Rfam:RNA 家族的社区资源
  • 批准号:
    BB/S020462/1
  • 财政年份:
    2019
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
RNAcentral, the RNA sequence database
RNAcentral,RNA 序列数据库
  • 批准号:
    BB/N019199/1
  • 财政年份:
    2017
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Rfam: Towards a sustainable resource for understanding the genomic functional ncRNA repertoire
Rfam:寻找了解基因组功能 ncRNA 库的可持续资源
  • 批准号:
    BB/M011690/1
  • 财政年份:
    2015
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Keeping pace with protein sequence annotation; consolidating and enhancing Pfam and InterPro's methodologies for functional prediction
与蛋白质序列注释保持同步;
  • 批准号:
    BB/L024136/1
  • 财政年份:
    2014
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
The RNAcentral database of non-coding RNAs
非编码RNA的RNA中央数据库
  • 批准号:
    BB/J019232/1
  • 财政年份:
    2012
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Embracing new technologies to streamline improve and sustain InterPro and its contributing databases
采用新技术来简化、改进和维护 InterPro 及其贡献数据库
  • 批准号:
    BB/F010435/1
  • 财政年份:
    2008
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant

相似海外基金

WELL-CALF: optimising accuracy for commercial adoption
WELL-CALF:优化商业采用的准确性
  • 批准号:
    10093543
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Collaborative R&D
Investigating the acceptability and accuracy of cervical screening and self-sampling in postnatal women to coincide with the 6-week postnatal check-up
调查产后妇女进行宫颈筛查和自我采样以配合产后 6 周检查的可接受性和准确性
  • 批准号:
    MR/X030776/1
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
  • 批准号:
    2317232
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Continuing Grant
Sample Size calculations for UPDATing clinical prediction models to Ensure their accuracy and fairness in practice (SS-UPDATE)
用于更新临床预测模型的样本量计算,以确保其在实践中的准确性和公平性(SS-UPDATE)
  • 批准号:
    MR/Z503873/1
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
Improving accuracy, coverage, and sustainability of functional protein annotation in InterPro, Pfam and FunFam using Deep Learning methods PID 7012435
使用深度学习方法提高 InterPro、Pfam 和 FunFam 中功能蛋白注释的准确性、覆盖范围和可持续性 PID 7012435
  • 批准号:
    BB/X018563/1
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
STTR Phase I: Microhydraulic Actuator for High-Accuracy, High-Speed Position Stages
STTR 第一阶段:用于高精度、高速位置平台的微液压执行器
  • 批准号:
    2335170
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
  • 批准号:
    2317233
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Continuing Grant
DMS-EPSRC: Certifying Accuracy of Randomized Algorithms in Numerical Linear Algebra
DMS-EPSRC:验证数值线性代数中随机算法的准确性
  • 批准号:
    EP/Y030990/1
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Research Grant
An innovative Lawtech AI/ML platform with human oversight that manages off-payroll worker status and periodically assesses the role status to ensure accuracy.
具有人工监督功能的创新 Lawtech AI/ML 平台,可管理工资外员工的状态并定期评估角色状态以确保准确性。
  • 批准号:
    10099483
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Collaborative R&D
CIF: Small: Ensuring Accuracy in Differentially Private Decentralized Optimization
CIF:小:确保差分隐私去中心化优化的准确性
  • 批准号:
    2334449
  • 财政年份:
    2024
  • 资助金额:
    $ 95.75万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了