权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Database Of Conserved Domain Alignments

保守域比对数据库

基本信息

批准号：
6681403
负责人：
STEPHEN H. BRYANT
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6681403
关键词：
Internet biochemical evolution computer program /software literature survey molecular biology information system molecular genetics protein sequence protein structure function structural biology

项目摘要

We are producing a database of expert-curated protein domain alignments, describing sequence and 3D-structure conservation within protein families. These alignments are used to produce position-specific score matrices (PSSMs) that may in turn be used in NCBI?s web-based protein classification resources. Links to the Conserved Domain Database (CDD) are made by default from NCBI?s BLAST resource, http://www.ncbi.nlm.nih.gov/BLAST/, and from protein records in NCBI?s PubMed/Entrez browser, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi. Further information about CDD and these search services is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. These servers may be used to identify conserved domains within a protein sequence. They summarize the known functions of family members, using relevant citations from PubMed when possible. They also provide site-specific functional annotation, via sequence and structure alignments and via evidence-based interaction-site features. The CDD alignment project differs from earlier efforts in two fundamental ways: 3D-structure information is used whenever possible to guide alignments, and an explicit hierarchy of families and subfamilies describes the evolutionary history of each domain. When a 3D structure is known within a domain family, this information is used to define a conserved 3D core structure, a set of un-gapped blocks that must be identified in all representative sequences included in the alignment. Representative sequences are aligned to this core structure using threading or structure-based alignment algorithms or, when multiple structures are known, by structure-structure alignment. These procedures assure high alignment accuracy, as needed for accurate transfer of annotation to new family members identified by searching. Explicit hierarchies identify major gene duplication events in the molecular evolution of each family. Our basic strategy is to use domain-sequence clustering methods together with known domain architecture and phylogeny to identify what appear to be ancient orthology groups. These define explicitly annotated ?children? of the overall ?parent? alignment, and in turn provide more specific functional annotation. The CDD project employs a high level of automation, to produce structure-based alignments, to identify candidate orthology groups, to update CDD alignments with new sequences and structures, and to ?publish? the results to web servers. These algorithms and associated software required are described under another project, ?Alignment methods for a conserved domain database?. This project describes human-expert curation of CDD alignments. The role of the CDD curators is multifaceted. They first of all must survey relevant scientific literature, to produce concise summaries of the known functions of each domain family and to choose citations useful to users of NCBI?s web-based classification resources. Curators must also examine the results of automated sequence and structure comparison to infer the location of conserved core blocks, an iterative process that requires judgment with respect to elimination of incomplete or erroneous sequence and structure data. Curators must also identify apparent orthology groups, based on the consensus of results from alternative molecular evolution and clustering methods. The CDD curation project is new, and results over this year consist primarily of recruiting and training PhD biologists as CDD curators. Nonetheless, this group has produced several hundred curated CDD families which are now available via NCBI?s protein classification servers.

我们正在建立一个专家策划的蛋白质结构域比对数据库，描述蛋白质家族中的序列和3D结构保守性。这些比对是用来产生位置特异性得分矩阵（PSSMs），可以反过来在NCBI？的基于网络的蛋白质分类资源。链接到保守域数据库（CDD）是默认从NCBI？的BLAST资源，http：//www.ncbi.nlm.nih.gov/BLAST/，并从蛋白质记录在NCBI？的PubMed/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi有关CDD和这些搜索服务的更多信息，请访问http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml。这些服务器可用于鉴定蛋白质序列内的保守结构域。他们总结了家族成员的已知功能，尽可能使用PubMed的相关引文。它们还通过序列和结构比对以及基于证据的相互作用位点特征提供位点特异性功能注释。 CDD比对项目在两个基本方面与早期的努力不同：尽可能使用3D结构信息来指导比对，以及明确的家族和亚家族层次结构描述每个域的进化历史。当在结构域家族内已知3D结构时，该信息用于定义保守的3D核心结构，即必须在比对中包括的所有代表性序列中鉴定的一组无空位的块。使用线程或基于结构的比对算法将代表性序列与该核心结构进行比对，或者当已知多个结构时，通过结构-结构比对。这些程序确保了高比对准确性，如将注释准确转移到通过搜索识别的新家族成员所需。明确的层次结构确定每个家庭的分子进化中的主要基因重复事件。我们的基本策略是使用域序列聚类方法与已知的域架构和同源性，以确定什么似乎是古老的正字法组。这些定义明确注释？孩子们？的整体？家长？对齐，并反过来提供更具体的功能注释。 CDD项目采用了高水平的自动化，以产生基于结构的比对，以确定候选的同源组，更新CDD比对新的序列和结构，并？出版？将结果发送到Web服务器。这些算法和所需的相关软件在另一个项目下描述，？保守结构域数据库的比对方法。这个项目描述了人类专家对CDD比对的管理。CCD策展人的角色是多方面的。他们首先必须调查相关的科学文献，以产生每个域家族的已知功能的简明摘要，并选择对NCBI用户有用的引文。的网络分类资源。策展人还必须检查自动序列和结构比较的结果，以推断保守核心块的位置，这是一个迭代过程，需要对消除不完整或错误的序列和结构数据进行判断。策展人还必须根据其他分子进化和聚类方法的一致结果，确定明显的同源组。CDD策展项目是新的，今年的成果主要包括招募和培训博士生物学家作为CDD策展人。尽管如此，这个小组已经产生了几百个策划的CDD家庭，现在可以通过NCBI？的蛋白质分类服务器。