BBSRC-NSF/BIO:CIBR:Implementing an explicit phylogenetic framework for large-scale protein sequence annotation
BBSRC-NSF/BIO:CIBR:为大规模蛋白质序列注释实施明确的系统发育框架
基本信息
- 批准号:1917302
- 负责人:
- 金额:$ 82.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-08-15 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Technological advances have enabled researchers to determine the chemical composition (sequence) of millions of different proteins from thousands of organisms. However, making use of this information in applications such as medicine and agriculture requires additional work to determine the functions of these proteins-what they do at the molecular and organism levels. The first step in determining function of a new protein is to compare its sequence to other proteins whose functions have been analyzed experimentally. Under previous funding, computational methods for reconstructing the evolutionary history of each group of related proteins developed and this history was used to suggest functions that have been conserved during evolution and provide the starting point for protein function analysis. The next step in this work is to enable application to the millions of protein sequences available, and to extend the method to include more detailed information about protein function. This project will develop and test a practical, production-grade implementation of our method, and apply it to UniProt, the world's largest database of protein sequences. UniProt is publicly available, so results will be broadly available and usable by both scientists and non-scientists alike. Educational materials will be developed to help make the results more accessible to students and non-scientists.The UniProt protein knowledgebase aims to maximize the utility of protein sequence data to the scientific community by representing not only the sequences themselves, but also annotations: metadata describing information that can be inferred about those sequences, such as predicted protein function. The current approach to large-scale annotation of UniProt, called UniRule, relies on ad hoc rules to define sets of proteins that should be annotated similarly. While these rules implicitly utilize information about evolutionary relationships (e.g. membership in a protein family), they do not model function evolution explicitly and are therefore limited in the specificity of annotations they can express. This project implements an explicit evolutionary approach to large-scale sequence annotation, building upon previous work 1) on evolutionary modeling of gain and loss of protein functions (represented as terms from the Gene Ontology) in gene families, and 2) on software to reconstruct the evolutionary history any arbitrary protein sequence by placing it in the context of a phylogenetic tree. Production-level implementation of this approach within the UniProt resource will integrate the large-scale annotation systems already used in the UniProt and Gene Ontology projects, and result in increased specificity and coverage of annotations in the UniProt knowledgebase. The project will significantly improve annotations on tens of millions of sequences in the UniProt knowledgebase, impacting the massive UniProt user base. It will also provide the annotations for Gene Ontology-based analyses of the large number of fully sequenced genomes in UniProt, making such analyses more broadly available. A new online educational module for protein function evolution will curate a learning path starting with this module and including other available online modules. The results will be available in the UniProt resource (uniprot.org), and all software and annotation metadata will be available at pantree.org.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
技术进步使研究人员能够确定来自数千种生物体的数百万种不同蛋白质的化学组成(序列)。然而,要将这些信息应用于医学和农业等领域,还需要做额外的工作来确定这些蛋白质的功能——它们在分子和有机体水平上的作用。确定一种新蛋白质功能的第一步是将其序列与实验分析过功能的其他蛋白质进行比较。在先前的资助下,用于重建每组相关蛋白质的进化史的计算方法已经开发出来,并且该历史被用来建议在进化过程中保守的功能,并为蛋白质功能分析提供起点。这项工作的下一步是使其应用于数百万种可用的蛋白质序列,并扩展该方法以包括有关蛋白质功能的更详细信息。该项目将开发和测试我们方法的实际、生产级实施,并将其应用于世界上最大的蛋白质序列数据库UniProt。UniProt是公开可用的,因此结果将广泛可用,科学家和非科学家都可以使用。将编写教育材料,帮助学生和非科学家更容易获得研究结果。UniProt蛋白质知识库旨在最大限度地利用蛋白质序列数据,不仅表示序列本身,而且还表示注释:描述可以推断这些序列的信息的元数据,例如预测的蛋白质功能。目前对UniProt进行大规模注释的方法被称为UniRule,它依赖于特定的规则来定义应该进行类似注释的蛋白质集。虽然这些规则隐含地利用了关于进化关系的信息(例如,在蛋白质家族中的成员),但它们并没有明确地模拟功能进化,因此它们所能表达的注释的特异性受到限制。该项目实现了一种明确的大规模序列注释的进化方法,建立在先前工作的基础上:1)基因家族中蛋白质功能的获得和损失的进化建模(以基因本体的术语表示),以及2)通过将任意蛋白质序列置于系统发育树的背景下重建其进化史的软件。这种方法在UniProt资源中的生产级实现将集成已经在UniProt和Gene Ontology项目中使用的大规模注释系统,并导致UniProt知识库中注释的特异性和覆盖率增加。该项目将显著改善UniProt知识库中数千万序列的注释,影响庞大的UniProt用户群。它还将为UniProt中基于基因本体论的大量全测序基因组分析提供注释,使此类分析更广泛地可用。一个新的蛋白质功能进化在线教育模块将策划一个学习路径,从这个模块开始,包括其他可用的在线模块。结果将在UniProt资源(uniprot.org)上提供,所有软件和注释元数据将在pantree.org上提供。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Quest for Orthologs orthology benchmark service in 2022.
- DOI:10.1093/nar/gkac330
- 发表时间:2022-07-05
- 期刊:
- 影响因子:14.9
- 作者:Nevers Y;Jones TEM;Jyothi D;Yates B;Ferret M;Portell-Silva L;Codo L;Cosentino S;Marcet-Houben M;Vlasova A;Poidevin L;Kress A;Hickman M;Persson E;Piližota I;Guijarro-Clarke C;OpenEBench team the Quest for Orthologs Consortium;Iwasaki W;Lecompte O;Sonnhammer E;Roos DS;Gabaldón T;Thybert D;Thomas PD;Hu Y;Emms DM;Bruford E;Capella-Gutierrez S;Martin MJ;Dessimoz C;Altenhoff A
- 通讯作者:Altenhoff A
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Paul Thomas其他文献
Bystander reporting to prevent violent extremism and targeted violence: learning from practitioners
旁观者举报以防止暴力极端主义和有针对性的暴力:向从业者学习
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:2
- 作者:
D. Eisenman;S. Weine;Nilpa D. Shah;Nicole V. Jones;Chloe Polutnik Smith;Paul Thomas;Michele Grossman - 通讯作者:
Michele Grossman
Antenatal screening for haemoglobinopathies in primary care: a whole system participatory action research project.
初级保健中血红蛋白病的产前筛查:整个系统参与行动研究项目。
- DOI:
- 发表时间:
2005 - 期刊:
- 影响因子:0
- 作者:
Paul Thomas;Lola Oni;M. Alli;J.N. D.M. K. D. J.J. St. Hilaire;Alma Smith;C. Leavey;Ricky Banarsee - 通讯作者:
Ricky Banarsee
Towards Searching Amongst Tables
在表中搜索
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Paul Thomas;Rollin M. Omari;Tom Rowlands - 通讯作者:
Tom Rowlands
Enhancing Human Annotation: Leveraging Large Language Models and Efficient Batch Processing
增强人工注释:利用大型语言模型和高效的批处理
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Oleg Zendel;J. Culpepper;Falk Scholer;Paul Thomas - 通讯作者:
Paul Thomas
Search Interfaces for Biomedical Searching: How do Gaze, User Perception, Search Behaviour and Search Performance Relate?
生物医学搜索的搜索界面:注视、用户感知、搜索行为和搜索性能有何关系?
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Ying;Paul Thomas;Tom Gedeon;Nicolay Rusnachenko - 通讯作者:
Nicolay Rusnachenko
Paul Thomas的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Paul Thomas', 18)}}的其他基金
Bilateral BBSRC-NSF/BIO:Towards detailed and consistent function prediction from protein family databases
双边 BBSRC-NSF/BIO:从蛋白质家族数据库进行详细且一致的功能预测
- 批准号:
1458808 - 财政年份:2015
- 资助金额:
$ 82.82万 - 项目类别:
Continuing Grant
相似国自然基金
SYNJ1蛋白片段通过促进突触蛋白NSF聚集在帕金森病发生中的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
NSF蛋白亚硝基化修饰所介导的GluA2 containing-AMPA受体膜稳定性在卒中后抑郁中的作用及机制研究
- 批准号:82071300
- 批准年份:2020
- 资助金额:55 万元
- 项目类别:面上项目
参加中美(NSFC-NSF)生物多样性项目评审会
- 批准号:
- 批准年份:2019
- 资助金额:2 万元
- 项目类别:国际(地区)合作与交流项目
参加中美(NSFC-NSF)生物多样性项目评审会
- 批准号:31981220281
- 批准年份:2019
- 资助金额:2.3 万元
- 项目类别:国际(地区)合作与交流项目
中美(NSFC-NSF)EEID联合评审会
- 批准号:
- 批准年份:2019
- 资助金额:2.6 万元
- 项目类别:国际(地区)合作与交流项目
中美(NSFC-NSF)EEID联合评审会
- 批准号:81981220037
- 批准年份:2019
- 资助金额:2.1 万元
- 项目类别:国际(地区)合作与交流项目
中美(NSFC-NSF)EEID联合评审会
- 批准号:
- 批准年份:2019
- 资助金额:1.2 万元
- 项目类别:国际(地区)合作与交流项目
Mon1b 协同NSF调控早期内吞体膜融合的机制研究
- 批准号:31671397
- 批准年份:2016
- 资助金额:67.0 万元
- 项目类别:面上项目
相似海外基金
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y000455/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
BBSRC-NSF/BIO: An AI-based domain classification platform for 200 million 3D-models of proteins to reveal protein evolution
BBSRC-NSF/BIO:基于人工智能的域分类平台,可用于 2 亿个蛋白质 3D 模型,以揭示蛋白质进化
- 批准号:
BB/Y001117/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO Building synthetic regulatory units to understand the complexity of mammalian gene expression
22-BBSRC/NSF-BIO 构建合成调控单元以了解哺乳动物基因表达的复杂性
- 批准号:
BB/Y008898/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
20-BBSRC/NSF-BIO Regulatory control of innate immune response in marine invertebrates
20-BBSRC/NSF-BIO 海洋无脊椎动物先天免疫反应的调节控制
- 批准号:
BB/W017865/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO - Interpretable & Noise-robust Machine Learning for Neurophysiology
22-BBSRC/NSF-BIO - 可解释
- 批准号:
BB/Y008758/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
22-BBSRC/NSF-BIO: Community-dependent CRISPR-cas evolution and robust community function
22-BBSRC/NSF-BIO:群落依赖性 CRISPR-cas 进化和强大的群落功能
- 批准号:
BB/Y008774/1 - 财政年份:2024
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
UKRI/BBSRC-NSF/BIO: Interpretable and Noise-Robust Machine Learning for Neurophysiology
UKRI/BBSRC-NSF/BIO:用于神经生理学的可解释且抗噪声的机器学习
- 批准号:
2321840 - 财政年份:2023
- 资助金额:
$ 82.82万 - 项目类别:
Continuing Grant
21-BBSRC/NSF-BIO: Developing large serine integrases as tools for constructing and manipulating synthetic replicons.
21-BBSRC/NSF-BIO:开发大型丝氨酸整合酶作为构建和操作合成复制子的工具。
- 批准号:
BB/X012085/1 - 财政年份:2023
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
UKRI/BBSRC-NSF/BIO Determining the Roles of Fusarium Effector Proteases in Plant Pathogenesis
UKRI/BBSRC-NSF/BIO 确定镰刀菌效应蛋白酶在植物发病机制中的作用
- 批准号:
BB/X012131/1 - 财政年份:2023
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant
BBSRC-NSF/BIO. Globally harmonized re-analysis of Data Independent Acquisition (DIA) proteomics datasets enables the creation of new resources
BBSRC-NSF/BIO。
- 批准号:
BB/X002020/1 - 财政年份:2023
- 资助金额:
$ 82.82万 - 项目类别:
Research Grant