Improvements And Extensions To The Blast Algorithms
Blast 算法的改进和扩展
基本信息
- 批准号:6546809
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 70,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focussed on the improvement of the PSI-BLAST program: PSI-BLAST searches a database of protein sequences using a position-specific score matrix (PSSM) as query. The PSSMs used are generally constructed on the fly, through multiple iterations of database searching, initiated with a standard protein sequence. PSI-BLAST has been widely used to annotate proteins inferred from new DNA sequences, and to generate sets of PSSMs representing large classes of proteins. In order to improve the sensitivity of the PSI-BLAST program to distant sequence relationships, we developed a system to evaluate the program's performance. For a set of about 100 query sequences, experts in the group compiled an exhaustive list of related proteins in yeast. The queries can then be compared to a comprehensive protein sequence databease through an arbitary number of PSI-BLAST iterations, and the resulting PSSM compared to the complete yeast sequence. This procedure generates a list of yeast sequences ordered by E-value, from which a plot of false positives vrs. true positives may be obtained. We used our evaluation system to improve the average sensitivity of PSI-BLAST to distant relationships. The changes adopted include: 1) Filtering the database sequences rather than the query for segments of restricted amino acid composition; 2) Using the Smith-Waterman algorithm to construct any alignments reported; 3) Improving the numerical precision in the calculation of amino acid pair target/background frequency ratios; 4) Adopting an improved estimation of statistical and edge-effect parameters; 5) Calculating E-values based upon the composition of the database sequence hit rather than upon a standard protein amino acid composition; 6) Letting gaps in a given alignment column render the projected amino acid frequencies for that column closer to background frequencies; 7) Adopting composition-based statistics only when they have the effect of increasing E-values; 8) Decreasing the pseudocount constant from 10 to 9; 9) Increasing the percent difference from other sequences required for inclusion in the multiple alignment from 2% to 6%. All these changes have been incorporated into the version of PSI-BLAST now available over the public NCBI web page. The new program is much less likely to return false positives, with spurious low E-values.
蛋白质和DNA数据库搜索程序的BLAST家族构成了NCBI提供的关键服务之一。在一个典型的工作日,这些程序目前在NCBI服务器上运行约7万次。该项目是为改进和扩展这些项目的功能而进行的持续努力。今年的努力集中在PSI-BLAST程序的改进上:PSI-BLAST使用位置特定得分矩阵(PSSM)作为查询来搜索蛋白质序列的数据库。所使用的PSSM通常是通过数据库搜索的多次迭代、以标准蛋白质序列开始的动态构建的。PSI-BLAST已被广泛用于注释从新的DNA序列推断的蛋白质,并生成代表大类蛋白质的PSSM集。为了提高PSI-BLAST程序对远距离序列关系的敏感性,我们开发了一个程序性能评估系统。对于一组大约100个查询序列,该小组的专家们编制了一份详尽的酵母相关蛋白质清单。然后,可以通过任意次数的PSI-BLAST迭代将查询与全面的蛋白质序列数据库进行比较,并将得到的PSSM与完整的酵母序列进行比较。此程序生成按E值排序的酵母序列列表,根据该列表绘制假阳性VRS图。可能会得到真正的积极结果。我们使用我们的评估系统来提高PSI-BLAST对远程关系的平均敏感度。采用的改变包括:1)过滤数据库序列而不是查询限制氨基酸组成的片段;2)使用Smith-Waterman算法构建任何报告的比对;3)提高氨基酸对目标/背景频率比的计算精度;4)采用改进的统计和边缘效应参数估计;5)基于命中的数据库序列的组成而不是标准蛋白质氨基酸组成来计算E值;6)让给定比对列中的间隙使该列的预测氨基酸频率更接近背景频率;7)只有当它们具有增加E值的效果时才采用基于成分的统计;8)将假计数常数从10降低到9;9)将包括在多重比对中所需的与其他序列的百分比差异从2%增加到6%。所有这些更改都已纳入PSI-BLAST版本,该版本现可通过NCBI公共网页获得。新程序返回假阳性的可能性要小得多,E值低得令人难以置信。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
STEPHEN F ALTSCHUL其他文献
STEPHEN F ALTSCHUL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}