STATISTICS OF SEQUENCE COMPARISON
序列比较统计
基本信息
- 批准号:6290478
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) The development of a more accurate method to assess statistical significance in the context of a database search. The assignment of E-values in the BLAST family of programs has depended upon the use of a standard composition for database sequences. This can result in alignments involving sequences with similarly biased compositions receiving inappropriately low E- values. A new approach re-estimates the relevant statistical parameters for each pair of sequences that yield a seemingly significant alignment. The new parameters lead to a revised estimate of statistical significance. This can have a major effect on the output of PSI-BLAST, where the inclusion of a false positive during one iteration can corrupt all further results. b) The implementation of a fast method for extracting a maximum-likelihood estimate of statistical parameters for local alignment scores. The estimation of statistical parameters for gapped local alignments has been very time consuming. To estimate the scale parameter to within 0.5% has required optimal local alignment scores from 24,000 pair wise comparisons, requiring over two hours of cpu time on a standard current workstation. Recently, some work of T. Hwa and colleagues at UCSD has suggested a much faster way of estimating the relevant parameters, involving the collection of scores from local alignment islands. This method has reduced the computation time required by a factor of 10 to20. I have implemented a modified version of the Hwa et al. method, and initiated plans for collaboration. - alignments, statistics, substitution scores, gap scores, extreme value distribution
这个项目是一个持续的研究问题,当两个蛋白质或DNA序列进行比较时,什么样的相似性可以预期纯粹是偶然发生的。一个附属的和相关的问题涉及的评分系统,是最佳的区分生物学意义的模式从机会相似性的定义。今年的工作包括:(a)制定一种更准确的方法,在数据库搜索的范围内评估统计意义。在BLAST程序家族中E值的分配依赖于数据库序列的标准组合物的使用。这可能导致涉及具有类似偏向组成的序列的比对接收不适当的低E值。一种新的方法重新估计了每对序列的相关统计参数,这些参数产生了看似重要的比对。新参数导致对统计显著性的估计进行了修订。这可能对PSI-BLAST的输出产生重大影响,其中在一次迭代期间包含假阳性可能会破坏所有进一步的结果。B)用于提取局部比对分数的统计参数的最大似然估计的快速方法的实现。缺口局部比对的统计参数的估计是非常耗时的。为了将尺度参数估计到0.5%以内,需要来自24,000个成对比较的最佳局部比对分数,在标准电流工作站上需要超过两小时的CPU时间。最近,T.加州大学圣地亚哥分校的Hwa和同事提出了一种更快的方法来估计相关参数,包括从本地对齐岛屿收集分数。这种方法使计算时间减少了10到20倍。我已经实施了Hwa等人的方法的修改版本,并启动了合作计划。- 比对,统计,替代分数,差距分数,极值分布
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
STEPHEN F ALTSCHUL其他文献
STEPHEN F ALTSCHUL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}