权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ACTIVE SITE SIGNATURES FOR AUTOMATIC UPDATES OF SFLD SUPERFAMILIES

用于 SFLD 超家族自动更新的活动站点签名

基本信息

批准号：
8363621
负责人：
PATRICIA CLEMENT BABBITT
金额：
$ 1.68万
依托单位：
UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
依托单位国家：
美国
项目类别：
财政年份：
2011
资助国家：
美国
起止时间：
2011-07-01 至 2012-06-30
项目状态：
已结题

项目摘要

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. A major unsolved problem for structure-function linkage using computational prediction is that while we can accurately cluster protein sequences and structures with good statistical significance based on many types of similarity metrics, how those clusters link to functional classes is not clear. Although simple approaches such as ortholog prediction can achieve good results for sequences that are closely similar or that contain readily identifiable motifs that distinguish functional classes, for many protein superfamilies successful prediction is far from trivial. This is the case for the functionally diverse superfamilies in the SFLD. These are homologous sets of enzymes that carry out different chemical transformations, using different substrates, but all share a specific chemical functionality or partial reaction. The main purpose of the SFLD is to aid researchers in the curation of these types of superfamilies, to help in the identification of new members of these superfamilies, and to provide an explicit structure-function mapping for these enzymes. Because the different functional families in a given superfamily look similar but perform different specific reactions, they are difficult to annotate and easy to misannotate, showing levels of misannotation as high as 80% in the archival databases Genbank NR and TrEMBL. Because sequence information is still coming available in large volumes, automated methods are required to update the SFLD superfamilies with newly determined sequences and assign them to the appropriate functional families. Clearly, improved methods for achieving these functional assignments are urgently needed. Development of an approach to achieve this has been a major focus of the RBVI in collaboration with the group of Prof. Jacquelyn Fetrow of Wake Forest University. The active site profiling methods developed by Dr. Fetrow have now been integrated with an approach developed in the Babbitt lab, Genetic Algorithm Search for Patterns in Structures: GASPS, to automatically determine 3D templates capable of distinguishing new superfamily members for the purpose of automatically assigning sequences to the specific functional families to which they belong. GASPS will be combined with Fetrow's methods to create sequence and structural motifs for automated clustering of SFLD data. The core elements of the method include a motif-generating technology called "Fuzzy Functional Forms", (FFF), implemented by the tool Protein Active Site Structure Search (PASSS), and the Deacon Active Site Profiler (DASP) which uses three-dimensional, or structure-based, active-site profiling to identify residues located in the spatial environment around the active site. PASSS uses the FFF technology, describing a proteins functional site by the distances between the alpha carbons of three key residues important to the functional site chemistry and the alpha carbons of adjacent residues. Based on the premise that functionally related proteins should have structural similarity at the functional site, PASSS returns related proteins to the starting known functional site. DASP expands on this, extracting the residues that are found in the vicinity of the key residues for each protein, creating motifs from these fragments, and using these fragments to search all sequences in a database to return proteins that may share this function. Use of these tools together, and in an iterative fashion, provides a quick method to putatively functionally characterize both structures and sequences. Preliminary results from this project show exceptional accuracy in distinguishing functionally diverse families in the enolase and the kinase superfamily. The former is one of the annotated superfamilies in the SFLD that serves as a challenging test system for this type of automated effort.

该子项目是利用资源的众多研究子项目之一由 NIH/NCRR 资助的中心拨款提供。子项目的主要支持并且子项目的主要研究者可能是由其他来源提供的，包括其他 NIH 来源。子项目可能列出的总成本代表子项目使用的中心基础设施的估计数量， NCRR 赠款不直接向子项目或子项目工作人员提供资金。使用结构-功能链接的一个未解决的主要问题计算预测是，虽然我们可以准确地聚类蛋白质基于许多具有良好统计意义的序列和结构相似性度量的类型，这些集群如何链接到功能类尚不清楚。尽管直向同源预测等简单方法可以对于非常相似或相似的序列取得良好的结果包含易于识别的图案来区分功能类别，对于许多蛋白质超家族来说，成功的预测绝非易事。 SFLD 中功能多样化的超家族就是这种情况。这些是执行不同化学反应的同源酶组转化，使用不同的底物，但都共享一个特定的化学官能团或部分反应。 SFLD的主要目的是为了帮助研究人员管理这些类型的超家族，帮助识别这些超级家族的新成员，并为这些酶提供明确的结构-功能图谱。因为给定超家族中的不同功能家族看起来相似，但是执行不同的特定反应，它们很难注释和容易错误注释，错误注释率高达 80% 档案数据库 Genbank NR 和 TrEMBL。因为序列信息是仍然可以大量使用，需要自动化方法用新确定的序列更新 SFLD 超家族并分配他们属于适当的职能家庭。显然，改进的方法迫切需要实现这些职能分配。开发实现这一目标的方法一直是 RBVI 与 Wake 的 Jacquelyn Fetrow 教授团队合作森林大学。博士开发的活性位点分析方法费特罗 (Fetrow) 现已与巴比特 (Babbitt) 开发的方法集成实验室，遗传算法搜索结构模式：GASPS，自动确定能够区分新的3D模板用于自动分配序列的超家族成员他们所属的特定功能家族。全球航空安全计划将结合费特罗的方法来创建序列和结构基序 SFLD 数据的自动聚类。该方法的核心要素包括一种称为“模糊功能形式”（FFF）的主题生成技术，通过蛋白质活性位点结构搜索（PASSS）工具实现，以及使用三维的 Deacon Active Site Profiler (DASP)，或基于结构的活性位点分析来识别位于活动地点周围的空间环境。 PASSS 使用 FFF 技术，通过之间的距离来描述蛋白质的功能位点对功能位点很重要的三个关键残基的α碳化学和相邻残基的α碳。基于前提是功能相关的蛋白质应该具有结构功能位点的相似性，PASSS 将相关蛋白质返回到启动已知的功能站点。 DASP 对此进行了扩展，提取了在每个关键残基附近发现的残基蛋白质，从这些片段创建基序，并使用这些片段搜索数据库中的所有序列以返回可能共享的蛋白质这个功能。以迭代方式一起使用这些工具，提供了一种快速方法来推定功能表征两者结构和序列。该项目的初步结果显示出极高的准确性区分烯醇酶和激酶中功能不同的家族超家族。前者是 SFLD 中注释的超家族之一对于这种类型的自动化来说，这是一个具有挑战性的测试系统努力。