STATISTICS OF SEQUENCE COMPARISON
序列比较统计
基本信息
- 批准号:6111055
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project is a continuing study of questions
concerning what similarities can be expected to occur purely by
chance when two protein or DNA sequences are compared. A
subsidiary and related question concerns the definition of scoring
systems that are optimal for distinguishing biologically meaningful
patterns from chance similarities. Work this year includes: a) The
definition of a new method for scoring gaps within protein
alignments, and the empirical study of the statistics of optimal
alignment scores using this scoring system. Based upon the
observation that a single mutational event can delete or insert
multiple residues, affine gap costs for sequence alignment charge a
penalty for the existence of a gap, and a further length-dependent
penalty. From structural or multiple alignments of distantly related
proteins, it has been observed that conserved residues frequently
fall into ungapped blocks separated by relatively non-conserved
regions. To take advantage of this structure, a simple generalization
of affine gap costs was proposed which allows non-conserved
regions to be effectively ignored. The distribution of scores from
local alignments using these generalized gap costs was shown
empirically to follow an extreme value distribution. In many cases
generalized affine gap costs yield superior alignments from the
standpoints both of statistical significance and alignment accuracy.
Guidelines for selecting generalized affine gap costs were
developed. b) The development of statistics for local alignments
seeded by a pattern. The recently developed PHI-BLAST program
constructs optimal local alignments seeded by a pattern specified by
a researcher. The random distribution of these local alignments was
studied both analytically and empirically. The statistics developed
were incorporated into the PHI-BLAST program, allowing it in
many instances to detect significant similarity between homologous
proteins that were not recognizably realted using traditional
single-pass database search methods.
这个项目是对问题的持续研究。
关于纯粹通过以下方式可以预期发生哪些相似之处
比较两个蛋白质或DNA序列时的机会。一个
附属和相关问题涉及评分的定义
最适合区分具有生物学意义的系统
来自偶然相似之处的模式。今年的工作包括:a)
一种新的蛋白质间隙评分方法的定义
路线,以及最优统计量的实证研究
使用此评分系统进行比对评分。基于
观察到单个突变事件可以删除或插入
多个残基,序列比对的仿射间隙成本
惩罚存在缺口,并进一步依赖于长度
处罚。从远相关的结构路线或多条路线
蛋白质中,经常观察到保守残基
落入由相对非保守隔开的未贴图的块中
地区。为了利用这种结构,一个简单的概括
提出了允许非守恒的仿射间隙成本
可以有效地忽略的区域。分数的分布来自
显示了使用这些广义缺口成本的局部对齐
在经验上遵循极值分布。在许多情况下
广义仿射缺口成本从
既有统计意义又有对齐准确性的观点。
选择广义仿射缺口成本的准则是
发展起来的。B)制定地方路线的统计数据
由一种图案播种。最近开发的PHI-BLAST计划
构造由指定的模式播种的最佳局部路线
一个研究人员。这些局部比对的随机分布是
从分析和实证两方面进行了研究。发展出的统计数据
被纳入PHI-BLAST计划,允许它进入
多个实例来检测同源序列之间的显著相似性
使用传统方法无法识别的蛋白质
单遍数据库搜索方法。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Stephen F Altschul其他文献
Stephen F Altschul的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}