IMPROVEMENTS AND EXTENSIONS TO THE BLAST ALGORITHMS
Blast 算法的改进和扩展
基本信息
- 批准号:6432754
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 70,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focused on the improvement of the PSI-BLAST program: PSI-BLAST searches a database of protein sequences using aposition-specific score matrix (PSSM) as query. The PSSMs used are generally constructed on the fly, through multiple iterations of database searching, initiated with a standard protein sequence. PSI-BLAST has been widely used to annotate proteins inferred from new DNA sequences, and to generate sets of PSSMs representing large classes of proteins. In order to improve the sensitivity of the PSI-BLAST program to distant sequence relationships, we developed a system to evaluate the program's performance. For a set of about 100query sequences, experts in the group compiled an exhaustive list of related proteins in yeast. The queries can then be compared to a comprehensive protein sequence database through an arbitary number of PSI-BLAST iterations, and the resulting PSSM compared to the complete yeast sequence. This procedure generates a list of yeast sequences ordered by E-value, from which a plot of false positives vrs. true positives may be obtained. We used our evaluation system to improve the average sensitivity of PSI-BLAST to distant relationships. The changes adopted include:1) Filtering the database sequences rather than the query for segments of restricted amino acid composition;2) Calculating E-values based upon the composition of the database sequence hit rather than upon a standard protein amino acid composition;3) Letting gaps in a given alignment column render the projected amino acid frequencies for that column closer to background frequencies;4) Decreasing the pseudo count constant from 10 to 7;5) Increasing the percent difference from other sequences required for inclusion in the multiple alignment from 2% to 5%.Most of these changes have been incorporated into the version of PSI-BLAST now available over the public NCBI web page, and the remaining changes will be made available at the time of publication. The new program is much less likely to return false positives, with spurious low E-values.
BLAST家族蛋白质和DNA数据库搜索程序构成了NCBI提供的关键服务之一。这些程序目前在NCBI服务器上运行约7万次,在一个典型的工作日。该项目代表了对这些程序的功能进行改进和扩展的持续努力。今年的工作重点是改进PSI-BLAST程序:PSI-BLAST使用位置特异性评分矩阵(PSSM)作为查询来搜索蛋白质序列数据库。使用的pssm通常是动态构建的,通过数据库搜索的多次迭代,以标准蛋白质序列开始。PSI-BLAST已被广泛用于注释从新DNA序列推断的蛋白质,并生成代表大类蛋白质的pssm集。为了提高PSI-BLAST程序对远序列关系的敏感性,我们开发了一个系统来评估程序的性能。对于一组大约100个查询序列,该小组的专家编制了酵母中相关蛋白质的详尽列表。然后可以通过任意次数的PSI-BLAST迭代将查询结果与综合蛋白质序列数据库进行比较,并将所得的PSSM与完整的酵母序列进行比较。该程序生成一个酵母序列列表,按e值排序,从中得到假阳性vrs图。可以得到真阳性。我们使用我们的评估系统来提高PSI-BLAST对远距离关系的平均灵敏度。所采取的改变包括:1)对数据库序列进行过滤,而不是对限制性氨基酸组成片段进行查询;2)基于数据库序列命中的组成计算e值,而不是基于标准蛋白质氨基酸组成;3)让给定比对列中的间隙使该列的预测氨基酸频率更接近背景频率;4)将伪计数常数从10降低到7;5)将与其他序列的差异从2%增加到5%。大多数这些变化已经被纳入PSI-BLAST版本,现在可以在公共NCBI网页上获得,其余的变化将在出版时提供。新程序不太可能返回假阳性,即假的低e值。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
STEPHEN F ALTSCHUL其他文献
STEPHEN F ALTSCHUL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('STEPHEN F ALTSCHUL', 18)}}的其他基金
相似海外基金
Travel: Student Support for the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)
旅行:学生支持第 47 届国际 ACM SIGIR 信息检索研究与发展会议 (SIGIR 2024)
- 批准号:
2409649 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: Explanation-based Optimization of Diversified Information Retrieval to Enhance AI Systems
职业:基于解释的多样化信息检索优化以增强人工智能系统
- 批准号:
2339932 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
SBIR Phase I: Knowledge Graph-powered Information Retrieval and Causal Inference
SBIR 第一阶段:知识图谱驱动的信息检索和因果推理
- 批准号:
2335357 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
SaTC: CORE: Small: Communication-Efficient, Fault-Tolerant Private Information Retrieval over Erasure Coded Storage
SaTC:核心:小型:通过纠删码存储进行通信高效、容错的私人信息检索
- 批准号:
2326312 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Continuing Grant
SaTC: CORE: Small: Practical Private Information Retrieval
SaTC:核心:小型:实用的私人信息检索
- 批准号:
2246386 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
A Study on Information Retrieval by Similarity and Heterogeneity of Concepts
基于概念相似性和异质性的信息检索研究
- 批准号:
23K11764 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Large-scale general-purpose language models for information retrieval tasks
用于信息检索任务的大规模通用语言模型
- 批准号:
22K21303 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Research Activity Start-up
Studying Visual Analytics Support for Interactive Information Retrieval within Complex Search Settings
研究复杂搜索设置中交互式信息检索的视觉分析支持
- 批准号:
RGPIN-2017-06446 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Time-aware Community-enhanced Social Information Retrieval
时间感知社区增强社交信息检索
- 批准号:
RGPIN-2021-03170 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Statistical Computation and Information Retrieval from Multivariate Data
多元数据的统计计算和信息检索
- 批准号:
RGPIN-2018-05663 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




