权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

STATISTICS OF SEQUENCE COMPARISON

序列比较统计

基本信息

批准号：
6111055
负责人：
Stephen F Altschul
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6111055
关键词：
computer assisted sequence analysis computer program /software nucleic acid sequence protein sequence statistics /biometry

项目摘要

This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) The definition of a new method for scoring gaps within protein alignments, and the empirical study of the statistics of optimal alignment scores using this scoring system. Based upon the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively non-conserved regions. To take advantage of this structure, a simple generalization of affine gap costs was proposed which allows non-conserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs was shown empirically to follow an extreme value distribution. In many cases generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and alignment accuracy. Guidelines for selecting generalized affine gap costs were developed. b) The development of statistics for local alignments seeded by a pattern. The recently developed PHI-BLAST program constructs optimal local alignments seeded by a pattern specified by a researcher. The random distribution of these local alignments was studied both analytically and empirically. The statistics developed were incorporated into the PHI-BLAST program, allowing it in many instances to detect significant similarity between homologous proteins that were not recognizably realted using traditional single-pass database search methods.

这个项目是对问题的持续研究。关于纯粹通过以下方式可以预期发生哪些相似之处比较两个蛋白质或DNA序列时的机会。一个附属和相关问题涉及评分的定义最适合区分具有生物学意义的系统来自偶然相似之处的模式。今年的工作包括：a) 一种新的蛋白质间隙评分方法的定义路线，以及最优统计量的实证研究使用此评分系统进行比对评分。基于观察到单个突变事件可以删除或插入多个残基，序列比对的仿射间隙成本惩罚存在缺口，并进一步依赖于长度处罚。从远相关的结构路线或多条路线蛋白质中，经常观察到保守残基落入由相对非保守隔开的未贴图的块中地区。为了利用这种结构，一个简单的概括提出了允许非守恒的仿射间隙成本可以有效地忽略的区域。分数的分布来自显示了使用这些广义缺口成本的局部对齐在经验上遵循极值分布。在许多情况下广义仿射缺口成本从既有统计意义又有对齐准确性的观点。选择广义仿射缺口成本的准则是发展起来的。B)制定地方路线的统计数据由一种图案播种。最近开发的PHI-BLAST计划构造由指定的模式播种的最佳局部路线一个研究人员。这些局部比对的随机分布是从分析和实证两方面进行了研究。发展出的统计数据被纳入PHI-BLAST计划，允许它进入多个实例来检测同源序列之间的显著相似性使用传统方法无法识别的蛋白质单遍数据库搜索方法。