权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Alignment Methods For A Conserved Domain Database

保守域数据库的比对方法

基本信息

批准号：
6843565
负责人：
STEPHEN H. BRYANT
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6843565
关键词：
chemical models computer assisted sequence analysis computer simulation computer system design /evaluation model design /development protein folding protein sequence statistics /biometry thermodynamics

项目摘要

We have developed computer methods to compare a protein's sequence with a library of "folds" from the structural database. The sequence is "threaded" through alternative structures, and those most compatible are identified by energy calculations, using contact potentials. Since they directly detect structural similarity, threading methods can identify very distant evolutionary relationships that may be undetectable by sequence comparison. Research has focused on testing of the core-element threading method, in blind predictions and control experiments, and on algorithmic improvements to increase sensitivity. Control experiments using known structures identified thresholds for successful fold recognition and accurate modeling: the similar "core" substructure must comprise 60% or more of the protein and must superpose to a residual of 2.5 Angstroms or less, such that a large fraction of contacts are preserved. Analysis of predictions for the 1996 CASP2 workshop (Critical Assessment of Structure Prediction) confirmed this conclusion. Structural similarity can be less extensive in some cases of distant relationship, however, and several improvements to increase sensitivity have been considered. New definitions of the "core" of database structures, according to the regions superimposable in homologs with known structures, has been show to reduce false negatives in threading predictions. Combination of contact potentials with sequence-motif scores was also shown to increases sensitivity in difficult recognition problems. Use of rigorous p-value calculations was shown to reduce false positives. With these improvements fold recognition may be expected to reliably detect a greater proportion of the distant evolutionary relationships. This has been demonstrated at the 1998 CASP3 workshop, where the NCBI team was awarded "first place" in fold recognition, among over 90 international groups entering the competition. The threading methods developed in this project are now being applied to construction of a conserved domain database (CDD). Seed domain alignments, derived from sequence comparison, are mapped onto known 3D structures and compared to 3D structure alignments, to define a core-structure alignment for a sample of representative domains. These alignments are validated by threading calculations, and additional representative sequences detected by RPS-BLAST scanning are merged into the alignment by threading. A newly developed algorithm for sequence vs. PSSM (position specific score matrix) alignment using core-element "blocks" has greatly speeded these calculations, and made core-element alignment into a practical tool for construction of curated protein domain alignments. CDD alignments serve as a protein classification system for public information retreival services. Domains with conserved structure and function are easily identified, and visualization of the resulting sequence/structure alignments provides a detailed annotation of structure-function relationships. Work this year has focussed on construction of the CDTree alignment heirarchy editing system. Versions 1 and then 2 were deployed to the CDD curation team, and release to the public is anticipated next year.

我们已经开发了计算机方法来比较蛋白质的序列与结构数据库中的“折叠”库。该序列是“线程”通过替代结构，并确定那些最兼容的能量计算，使用接触电位。由于它们直接检测结构相似性，因此线程方法可以识别通过序列比较可能无法检测到的非常遥远的进化关系。研究的重点是在盲预测和控制实验中测试核心元件线程方法，以及改进算法以提高灵敏度。使用已知结构的对照实验确定了成功折叠识别和准确建模的阈值：类似的“核心”亚结构必须包含60%或更多的蛋白质，并且必须叠加到2.5埃或更小的残差，使得大部分接触被保留。1996年CASP 2研讨会（结构预测的关键评估）的预测分析证实了这一结论。然而，在某些关系遥远的情况下，结构相似性可能不那么广泛，因此考虑了几项提高灵敏度的改进措施。根据已知结构的同源物中可重叠的区域，数据库结构的“核心”的新定义已被证明可以减少线程预测中的假阴性。接触电位与序列基序分数的组合也被证明可以增加识别困难问题的灵敏度。使用严格的p值计算可减少假阳性。有了这些改进，折叠识别有望可靠地检测到更大比例的远距离进化关系。这一点在1998年CASP 3研讨会上得到了证明，NCBI团队在90多个参加比赛的国际团体中获得了“第一名”。本计画所开发的多线程方法，目前已应用于保守领域资料库（CDD）的建构。将来源于序列比较的种子结构域比对映射到已知的3D结构上，并与3D结构比对进行比较，以定义代表性结构域样品的核心结构比对。通过穿线计算验证这些比对，并通过穿线将RPS-BLAST扫描检测到的其他代表性序列合并到比对中。一种新开发的使用核心元素“块”的序列与PSSM（位置特异性得分矩阵）比对算法大大加快了这些计算，并使核心元素比对成为构建策划蛋白质结构域比对的实用工具。CDD比对作为蛋白质分类系统用于公共信息检索服务。具有保守结构和功能的结构域很容易识别，并且所得序列/结构比对的可视化提供了结构-功能关系的详细注释。今年的工作重点是建设CDTree对齐层次编辑系统。版本1和2被部署到CDD策展团队，预计明年向公众发布。