权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistics of Sequence Comparison

序列比较统计

基本信息

批准号：
9555728
负责人：
STEPHEN F ALTSCHUL
金额：
$ 19.28万
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

项目摘要

The current direction of this project, in collaboration with Dr. Andrew Neuwald of the Institute for Genome Sciences and Department of Biochemistry & Molecular Biology at the University of Maryland School of Medicine, continued throughout this year. A previous focus had been the development of an improved method for multiple alignment that could identify the common elements shared by large and diverse protein superfamilies. A central aim this year was to extend this method to a hierarchical multiple alignment model. Such a model is based on the fact that large protein superfamilies frequently have diversified to fulfill distinct functional roles within different subfamilies. Each subfamily has distinct structural constraints, which yield distinct amino acid frequency vectors at particular positions characteristic of that subfamily. Although, within a subfamily, the amino acids at different positions may be independent, the changes in frequency vectors across multiple positions characteristic of each subfamily yields the appearance of correlation between positions when a simple, non-hierarchical model of a superfamily is constructed. Earlier approaches have modeled these apparent correlations directly, using pairwise coupling terms, but we model them by constructing an explicit hierarchical model, with individual sequences assigned to distinct nodes within the hierarchy. We apply the Minimum Description Length principle to insure that the hierarchical models we construct do not overfit the data, but have statistical support. We completed the development of a hierarchical multiple alignment program, and applied it to the analysis of N-acetyltransferases. Based upon statistical correlations, this approach identified a number of subfamilies, characterized by protein positions with distinctive amino acid usage, which suggested specific, previously uncharacterized biological mechanisms. A paper describing this this work was published. Another aim of this project, launched last year, was significantly advanced. The hierarchical models constructed by our approach include the explicit description of a set of "distinguishing positions" characteristic of each node in the hierarchy. When mapped only available three-dimensional structures, these positions often cluster together in space, and can aid in the development of specific hypotheses for the biological mechanisms underlying the diversification of protein subfamilies. We developed appropriate measures for the clustering of distinguished positions, and derived methods to assess their statistical significance. A paper describing this work is in press. Work continues on extending the clustering measures to allow them to capture more biologically relevant information.

该项目的当前方向，与博士合作。基因组科学研究所及系的 Andrew Neuwald 马里兰大学生物化学与分子生物学系医学院，今年继续。以前的一个重点是开发一种改进的方法，用于多种对齐可以识别大型共享的共同元素和多样化的蛋白质超家族。今年的中心目标是将此方法扩展到分层多重比对模型。这种模型基于以下事实：大型蛋白质超家族经常进行多元化以履行不同的职能角色不同亚科内。每个亚科都有不同的结构限制，产生不同的氨基酸频率该亚科特有的特定位置处的向量。尽管在一个亚科内，不同位置的氨基酸位置可能是独立的，频率向量的变化跨越每个亚科产量的多个位置特征当一个简单的、构建了超家族的非等级模型。早些时候方法直接对这些明显的相关性进行建模，使用成对耦合项，但我们通过构造对它们进行建模明确的分层模型，分配了单独的序列到层次结构中的不同节点。我们应用最低限度说明长度原则，确保层次结构我们构建的模型不会过度拟合数据，但具有统计性支持。我们完成了分层的开发多重比对程序，并将其应用于分析 N-乙酰转移酶。根据统计相关性，这种方法确定了许多亚科，其特征是通过具有独特氨基酸使用的蛋白质位置，建议特定的、以前未表征的生物机制。发表了一篇描述这项工作的论文。该项目于去年启动，其另一个目标是显着先进的。我们的方法构建的层次模型包括对一组“区分层次结构中每个节点的“位置”特征。当仅映射可用的三维结构，这些位置经常在太空中聚集在一起，并且可以帮助发展其生物学机制的具体假设蛋白质亚家族的多样化。我们开发了合适的杰出职位聚类的措施，以及派生方法来评估其统计显着性。一张纸描述这项工作正在印刷中。扩展工作仍在继续采取聚类措施，使他们能够捕获更多的生物信息相关信息。