权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Database Of Conserved Domain Alignments

保守域比对数据库

基本信息

批准号：
7316275
负责人：
STEPHEN H. BRYANT
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/7316275
关键词：
Database Conserved Domain Alignments

项目摘要

With the Conserved Domain Database (CDD) resource we are producing a database of expert-curated protein domain alignments. Such alignment models describe the sequence and 3D-structure conservation within protein families, facilitating the annotation of conserved functional features. The alignment models also describe the variability present in a domain family, facilitating the depiction of its functional diversity. This project describes curation of CDD alignments by human experts. The role of the CDD curators is multifaceted. First of all they must survey relevant scientific literature, to produce concise summaries of the known functions of each domain family, to study existing sub-family classifications, and to choose citations useful to users of NCBI?s web-based classification resources. Curators must also examine the results of automated sequence and structure comparison to infer the location of conserved core blocks, an iterative process that requires judgment with respect to elimination of incomplete or erroneous sequence and structure data. Curators must also identify apparent orthology groups, based on the consensus of results from alternative molecular evolution and clustering methods. The curator group has so far produced about 1500 curated CDD families. Both curated and un-curated multiple sequence alignments are used to generate position-specific scoring matrices (PSSMs), which may in turn be used in NCBI's web-based protein classification resources. A number of NCBI information services use CDD to identify conserved domains within protein sequences. Links to CDD are made, for example, by default from: 1) NCBI?s protein-BLAST resource, http://www.ncbi.nlm.nih.gov/BLAST/ 2) proteins in NCBI?s Entrez browser, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein 3) records in NCBI?s HomoloGene system, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene. Further information about CDD and these search services is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Curated domain models summarize the known functions of family members, using relevant citations from PubMed when possible, and may link to resources on the NCBI Bookshelf for further information. They also provide site-specific functional annotation, via sequence and structure alignments and via pre-recorded evidence-based features, such as interaction or active sites. The CDD alignment curation project differs from comparable efforts, upon which it builds, in two fundamental ways: (i) 3D-structure information is used in a quantitative way, whenever possible, to guide the alignments, and (ii) an explicit hierarchy of families and subfamilies, related by descend from a common ancestor, reflects the evolutionary history of each domain super-family. When at least one 3D structure is known within a domain family, this information is used to define the conserved homologous core structure, a set of un-gapped blocks that must be identified in all representative sequences included in the alignment. Representative sequences are aligned to this core structure using structure-informed alignment algorithms or, when multiple 3D structures are known, alignments obtained from structure superposition. These procedures assure high alignment accuracy, as needed for accurate transfer of annotation to new family members identified by searching. Representative sequences are picked from a set of ?preferred taxonomy nodes?, so that the domain alignments represent the taxonomic span of a family, which in turn indicates its apparent evolutionary age. Explicit hierarchies identify major gene duplication events in the molecular evolution of each family. Our basic strategy is to use domain-sequence clustering methods together with known domain architecture and phylogeny to identify what appear to be ancient orthology groups. These define explicitly annotated "children" of the overall "parent" alignment, and in turn provide more specific functional annotation. The CDD project employs a high level of automation, to produce structure-based alignments, to identify candidate orthology groups, to update CDD alignments with new sequences and structures, and to "publish" the results to web servers. These algorithms and associated software required are described under another project, "Alignment methods for a conserved domain database".

通过保守结构域数据库（CDD）资源，我们正在创建一个专家策划的蛋白质结构域比对数据库。这种比对模型描述了蛋白质家族内的序列和三维结构保守性，便于保守功能特征的注释。比对模型还描述了结构域家族中存在的变异性，便于描述其功能多样性。该项目描述了人类专家对CDD比对的管理。CCD策展人的角色是多方面的。首先，他们必须调查相关的科学文献，以产生每个域家族的已知功能的简明摘要，研究现有的子家族分类，并选择有用的引用NCBI的用户？的网络分类资源。策展人还必须检查自动序列和结构比较的结果，以推断保守核心块的位置，这是一个迭代过程，需要对消除不完整或错误的序列和结构数据进行判断。策展人还必须根据其他分子进化和聚类方法的一致结果，确定明显的同源组。到目前为止，策展人小组已经制作了大约1500个CDD家庭。策展和非策展的多重序列比对都用于生成位置特异性评分矩阵（PSSM），这些矩阵又可用于NCBI的基于网络的蛋白质分类资源。许多NCBI信息服务使用CDD来识别蛋白质序列中的保守结构域。例如，默认情况下，从以下位置链接到CDD： 1)NCBI？s蛋白质-BLAST资源，http://www.ncbi.nlm.nih.gov/BLAST/ 2)NCBI中的蛋白质s浏览器，http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=蛋白质 3)NCBI的记录的同源基因系统，http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=同源基因。有关CDD和这些搜索服务的更多信息，请访问http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml。策展的领域模型总结了家族成员的已知功能，尽可能使用PubMed的相关引文，并可能链接到NCBI书架上的资源以获取更多信息。它们还通过序列和结构比对以及通过预先记录的基于证据的特征（例如相互作用或活性位点）提供位点特异性功能注释。CDD比对策展项目与类似的努力不同，它建立在两个基本方面：（i）尽可能以定量的方式使用3D结构信息来指导比对，以及（ii）家族和子家族的明确层次结构，通过从共同祖先的后裔来联系，反映每个域超家族的进化历史。当在结构域家族内至少一个3D结构是已知的时，该信息用于定义保守的同源核心结构，即必须在比对中包括的所有代表性序列中鉴定的一组无缺口的块。使用结构信息比对算法将代表性序列与该核心结构进行比对，或者当已知多个3D结构时，使用从结构叠加获得的比对。这些程序确保了高比对准确性，如将注释准确转移到通过搜索识别的新家族成员所需。代表性的序列是从一组？首选分类节点？因此，结构域比对代表了一个科的分类跨度，这反过来又表明了它的表观进化年龄。明确的层次结构确定每个家庭的分子进化中的主要基因重复事件。我们的基本策略是使用域序列聚类方法与已知的域架构和同源性，以确定什么似乎是古老的正字法组。这些明确定义了总体“父”比对的注释“子”，并进而提供更具体的功能注释。CDD项目采用高水平的自动化，以产生基于结构的比对，识别候选的同源组，用新的序列和结构更新CDD比对，并将结果“发布”到Web服务器。这些算法和所需的相关软件在另一个项目“保守域数据库的对齐方法”中进行了描述。