权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

STATISTICS OF SEQUENCE COMPARISON

序列比较统计

基本信息

批准号：
6290478
负责人：
STEPHEN F ALTSCHUL
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6290478
关键词：
computer assisted sequence analysis computer program /software nucleic acid sequence protein sequence statistics /biometry

项目摘要

This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) The development of a more accurate method to assess statistical significance in the context of a database search. The assignment of E-values in the BLAST family of programs has depended upon the use of a standard composition for database sequences. This can result in alignments involving sequences with similarly biased compositions receiving inappropriately low E- values. A new approach re-estimates the relevant statistical parameters for each pair of sequences that yield a seemingly significant alignment. The new parameters lead to a revised estimate of statistical significance. This can have a major effect on the output of PSI-BLAST, where the inclusion of a false positive during one iteration can corrupt all further results. b) The implementation of a fast method for extracting a maximum-likelihood estimate of statistical parameters for local alignment scores. The estimation of statistical parameters for gapped local alignments has been very time consuming. To estimate the scale parameter to within 0.5% has required optimal local alignment scores from 24,000 pair wise comparisons, requiring over two hours of cpu time on a standard current workstation. Recently, some work of T. Hwa and colleagues at UCSD has suggested a much faster way of estimating the relevant parameters, involving the collection of scores from local alignment islands. This method has reduced the computation time required by a factor of 10 to20. I have implemented a modified version of the Hwa et al. method, and initiated plans for collaboration. - alignments, statistics, substitution scores, gap scores, extreme value distribution

这个项目是一个持续的研究问题，当两个蛋白质或DNA序列进行比较时，什么样的相似性可以预期纯粹是偶然发生的。一个附属的和相关的问题涉及的评分系统，是最佳的区分生物学意义的模式从机会相似性的定义。今年的工作包括：（a）制定一种更准确的方法，在数据库搜索的范围内评估统计意义。在BLAST程序家族中E值的分配依赖于数据库序列的标准组合物的使用。这可能导致涉及具有类似偏向组成的序列的比对接收不适当的低E值。一种新的方法重新估计了每对序列的相关统计参数，这些参数产生了看似重要的比对。新参数导致对统计显著性的估计进行了修订。这可能对PSI-BLAST的输出产生重大影响，其中在一次迭代期间包含假阳性可能会破坏所有进一步的结果。B）用于提取局部比对分数的统计参数的最大似然估计的快速方法的实现。缺口局部比对的统计参数的估计是非常耗时的。为了将尺度参数估计到0.5%以内，需要来自24，000个成对比较的最佳局部比对分数，在标准电流工作站上需要超过两小时的CPU时间。最近，T.加州大学圣地亚哥分校的Hwa和同事提出了一种更快的方法来估计相关参数，包括从本地对齐岛屿收集分数。这种方法使计算时间减少了10到20倍。我已经实施了Hwa等人的方法的修改版本，并启动了合作计划。- 比对，统计，替代分数，差距分数，极值分布