Remote homology detection with evolutionary profile HMMs

使用进化轮廓 HMM 进行远程同源性检测

基本信息

  • 批准号:
    2151294
  • 负责人:
  • 金额:
    $ 66.52万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-15 至 2025-05-31
  • 项目状态:
    未结题

项目摘要

The characterization of protein properties and functions is fundamental for life and bioengineering. Think of the spike protein that controls the infection of SARS-Cov-2 into human cells, the many proteins that control cancer tumor spread and growth, or the newly discovered enzymes that can convert plastic waste into usable proteins. A way to increase the number of functionally characterized proteins relies on finding similarities to other known proteins. The scientific community has had for some time several foundational and widely used technologies that compare proteins based on their amino acid sequences — the problem known as protein homology detection. Computational methods such as BLAST and HMMER that find similarities amongst proteins in the sequence databases are used routinely by experimental biologists working in all branches of life sciences. Still, many biological proteins found in living cells remain functionally uncharacterized. This project aims to implement, within the HMMER software package, a computational method able to recognize biological sequence similarities that current methods cannot yet detect. This method, which uses statistical models of sequence evolution, will result is many more proteins for which there is a hint to their function by establishing homologies between protein families currently assumed to be disconnected. This method will open the door for many more proteins to become bioengineering targets. Graduate and undergraduate students will be trained in statistical bioinformatics in the course of this project.Nowadays protein homology analysis relies on profile searches much more than on pairwise sequence search, but profile parameterization is still typically not calibrated to expected evolutionary distance of remote homologs, but rather just to distances of the observed sequences in the alignment used for training. This project proposes to turn those currently fixed parameters into functions that depend of evolutionary divergence. Turning the standard probabilistic methods of homology detection into divergence-parameterized models is novel and should improve sensitivity to very remote homologs. Substitution events have been modeled with PAM/BLOSUM matrices as well as more explicit substitution models, but the development of evolutionary models dealing with insertion and deletion events has proven difficult. This project will built upon the mathematical foundations for evolutionary models developed earlier by this group and others that are suitable to apply to the standard profile methods (with insertions and deletions) used for homology detection. Those evolutionary models will be implemented into a next version of HMMER software suite for remote homology detection and multiple sequence alignment. Stretching the parameters of the homology methods into divergence regimes beyond what has been observed, guided by an evolutionary model will result in more sensitive homology detection. This method will also provide a natural tool to set statistical boundaries on the detectability of homologs and the identification of clade-specific genes. This method will integrate homology with phylogeny into one a more powerful detection tool.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
蛋白质性质和功能的表征是生命和生物工程的基础。想想控制SARS-Cov-2感染人类细胞的刺突蛋白,控制癌症肿瘤扩散和生长的许多蛋白质,或者新发现的可以将塑料废物转化为可用蛋白质的酶。增加功能特征蛋白质数量的方法依赖于找到与其他已知蛋白质的相似性。一段时间以来,科学界已经有了几种基础和广泛使用的技术,这些技术根据氨基酸序列比较蛋白质-称为蛋白质同源性检测。计算方法,如BLAST和HMMER,发现序列数据库中的蛋白质之间的相似性,是在生命科学的所有分支工作的实验生物学家经常使用的。尽管如此,在活细胞中发现的许多生物蛋白质在功能上仍然没有特征。该项目的目的是在HMMER软件包中实现一种计算方法,该方法能够识别当前方法无法检测到的生物序列相似性。这种方法,它使用序列进化的统计模型,将导致更多的蛋白质,有一个提示,他们的功能,通过建立目前假定为断开蛋白质家族之间的同源性。这种方法将为更多的蛋白质成为生物工程目标打开大门。研究生和本科生将在本项目的过程中接受统计生物信息学方面的培训。如今,蛋白质同源性分析更多地依赖于图谱搜索而不是成对序列搜索,但图谱参数化仍然通常不校准到远程同源物的预期进化距离,而是仅校准到用于训练的比对中观察到的序列的距离。这个项目建议把那些目前固定的参数变成依赖于进化分歧的函数。把标准的概率同源性检测方法的分歧参数化模型是新颖的,应该提高灵敏度非常遥远的同源。替代事件已经建模与PAM/BLOSUM矩阵以及更明确的替代模型,但发展的进化模型处理插入和删除事件已被证明是困难的。这个项目将建立在进化模型的数学基础上,这些模型是由这个小组和其他人开发的,适用于同源性检测的标准图谱方法(插入和缺失)。这些进化模型将被实现到下一版本的HMMER软件套件中,用于远程同源性检测和多序列比对。在进化模型的指导下,将同源性方法的参数扩展到超出所观察到的发散范围将导致更灵敏的同源性检测。这种方法也将提供一个自然的工具,以设置同源物的可检测性和分支特异性基因的识别的统计边界。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Elena Rivas其他文献

Fitness functions for RNA structure design
RNA结构设计的适应度函数
  • DOI:
    10.1101/2022.06.16.496369
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Max Ward;Eliot Courtney;Elena Rivas
  • 通讯作者:
    Elena Rivas
The ‘squalene route’ to carotenoid biosynthesis is widespread in Bacteria
类胡萝卜素生物合成的“角鲨烯途径”在细菌中广泛存在
  • DOI:
    10.1101/2021.12.22.473825
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Carlos Santana;Valentina Henriques;D. Hornero;D. Devos;Elena Rivas
  • 通讯作者:
    Elena Rivas
RNA structure prediction using positive and negative evolutionary information
  • DOI:
    10.1101/2020.02.04.933952
  • 发表时间:
    2020-02
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Elena Rivas
  • 通讯作者:
    Elena Rivas
Genetic dissection of independent and cooperative transcriptional activation by the LysR-type activator ThnR at close divergent promoters
LysR型激活剂ThnR在接近分歧启动子处独立和协同转录激活的基因剖析
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Elena Rivas;B. Floriano;E. Santero
  • 通讯作者:
    E. Santero
Response to Tavares et al., “Covariation analysis with improved parameters reveals conservation in lncRNA structures”
对 Tavares 等人的回应,“参数改进的协变分析揭示了 lncRNA 结构的保守性”
  • DOI:
    10.1101/2020.02.18.955047
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Elena Rivas;S. Eddy
  • 通讯作者:
    S. Eddy

Elena Rivas的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Elena Rivas', 18)}}的其他基金

Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data
合作研究:创意实验室:通过大规模并行 RBP 结合和结构数据的计算集成发现新的功能性 RNA 类别
  • 批准号:
    2243704
  • 财政年份:
    2023
  • 资助金额:
    $ 66.52万
  • 项目类别:
    Standard Grant

相似国自然基金

Fibered纽结的自同胚、Floer同调与4维亏格
  • 批准号:
    12301086
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Remote Homology Detection for Metagenomic Libraries
宏基因组文库的远程同源性检测
  • 批准号:
    480839-2015
  • 财政年份:
    2015
  • 资助金额:
    $ 66.52万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
UTILIZING TERAGRID TO DETECT REMOTE SIMILARITY PROTEIN SEQUENCES
利用 teragrid 检测远程相似性蛋白质序列
  • 批准号:
    7956240
  • 财政年份:
    2009
  • 资助金额:
    $ 66.52万
  • 项目类别:
REMOTE PROTEIN SEQUENCE HOMOLOGY DETECTION
远程蛋白质序列同源性检测
  • 批准号:
    7956227
  • 财政年份:
    2009
  • 资助金额:
    $ 66.52万
  • 项目类别:
Computational Methods for Wrapping and Threading Remote Protein Homologs
包裹和穿线远程蛋白质同系物的计算方法
  • 批准号:
    7624601
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
Computational Methods for Wrapping and Threading Remote Protein Homologs
包裹和穿线远程蛋白质同系物的计算方法
  • 批准号:
    8076912
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
Computational Methods for Wrapping and Threading Remote Protein Homologs
包裹和穿线远程蛋白质同系物的计算方法
  • 批准号:
    7851282
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
Remote homology detection and beyond
远程同源性检测及其他
  • 批准号:
    333982-2006
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
UTILIZING TERAGRID TO DETECT REMOTE SIMILARITY PROTEIN SEQUENCES
利用 teragrid 检测远程相似性蛋白质序列
  • 批准号:
    7723381
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
REMOTE PROTEIN SEQUENCE HOMOLOGY DETECTION
远程蛋白质序列同源性检测
  • 批准号:
    7723368
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
Computational Methods for Wrapping and Threading Remote Protein Homologs
包裹和穿线远程蛋白质同系物的计算方法
  • 批准号:
    7460514
  • 财政年份:
    2008
  • 资助金额:
    $ 66.52万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了