SUBSTITUTION MATRICES INTO THE NSP-TREE IN BIOLOGICAL SEQUENCE DATABASES
生物序列数据库中 NSP 树的替换矩阵
基本信息
- 批准号:8167540
- 负责人:
- 金额:$ 2.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-04-01 至 2011-03-31
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsBiologicalBiological databasesBlast CellComputer Retrieval of Information on Scientific Projects DatabaseDataDatabasesFundingGrantInstitutionLettersPerformanceResearchResearch PersonnelResourcesSequence AlignmentSourceStructureTreesUnited States National Institutes of Healthbasedesignheuristicsindexingnoveloperationprogramsresearch studytool
项目摘要
This subproject is one of many research subprojects utilizing the
resources provided by a Center grant funded by NIH/NCRR. The subproject and
investigator (PI) may have received primary funding from another NIH source,
and thus could be represented in other CRISP entries. The institution listed is
for the Center, which is not necessarily the institution for the investigator.
A basic operation on biological sequence databases is to locate homologous regions for a given query sequence using pair-wise alignments. Unfortunately. the dynamic programming algorithm used for sequence alignments is computationally expensive, making it prohibitive for today's rapidly-growing sequence databases. Existing alignment tools, such as FAST A and BLAST. though fast in locating candidate homologous regions, sacrifice sensitivity for efficiency -they may miss some true homologous regions in database sequences. In this project, we will develop novel indexing algorithms for large biological databases that support efficient pair-wise sequence alignments with high sensitivity. Specifically, we will incorporate widely-used substitution matrices, such as PAM and BLOSUM, into the construction algorithms of the NSP-tree (an index structure designed for sequence data) so that sequences with evolutionarily-related letters are grouped together in the structure of the NSP-tree. As a result, indexed sequence groups with unrelated letters will obtain a low score when aligned to a given query sequence, and be promptly pruned. By enhancing the pruning power of the NSP-tree, we expect that the new index-based approach will provide high sensitivity while maintaining a comparable or even higher level of efficiency than that of existing pair-wise alignment tools. The project will be conducted in four steps: 1) Developing a new dynamic programming query algorithm to handle the alignments between a query sequence and sequence groups indexed in the tree; 2) Based on the substitution matrices, analyzing functionally conservative leiters in biological sequences, and creating a clustering tree that hierarchically organizes the proximity of the letters based on their evolutionary closeness; 3) Designing new heuristics that incorporate the clustering tree of letters into the construction algorithms of the NSP-tree; and 4) Conducting experimental studies on the performance of the new heuristics and comparing the performance of the NSP-tree with that of the existing tools.
这个子项目是许多研究子项目中利用
资源由NIH/NCRR资助的中心拨款提供。子项目和
调查员(PI)可能从NIH的另一个来源获得了主要资金,
并因此可以在其他清晰的条目中表示。列出的机构是
该中心不一定是调查人员的机构。
对生物序列数据库的基本操作是使用成对比对来定位给定查询序列的同源区域。不幸的是。用于序列比对的动态规划算法在计算上非常昂贵,这使得它对于今天快速增长的序列数据库来说是不可能的。现有的对齐工具,如FAST A和BLAST。虽然快速定位候选同源区域,但为了效率而牺牲了敏感性--它们可能会遗漏数据库序列中的一些真正的同源区域。在这个项目中,我们将为大型生物数据库开发新的索引算法,以支持高灵敏度的高效成对序列比对。具体地说,我们将把广泛使用的替换矩阵,如PAM和Blosum,结合到NSP-树(一种为序列数据设计的索引结构)的构建算法中,以便在NSP-树的结构中将具有进化相关字母的序列分组在一起。因此,具有不相关字母的索引序列组在与给定的查询序列对齐时将获得较低的分数,并被迅速剪除。通过增强NSP树的剪枝能力,我们预计新的基于索引的方法将提供高敏感度,同时保持与现有的配对工具相当甚至更高的效率水平。该项目将分四个步骤进行:1)开发一种新的动态规划查询算法来处理查询序列与树中索引的序列组之间的比对;2)基于替换矩阵,分析生物序列中功能保守的Leiter,并创建基于字母进化贴近度的层次组织的聚类树;3)设计新的启发式算法,将字母聚类树融入到NSP-树的构建算法中;4)对新启发式算法的性能进行实验研究,并将NSP-树的性能与现有工具的性能进行比较。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
GANG QIAN其他文献
GANG QIAN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('GANG QIAN', 18)}}的其他基金
USE THE EDIT DISTANCE IN THE ND-TREE FOR EFFICIENT BIOINFORMATICS QUERIES
使用 ND 树中的编辑距离进行高效的生物信息学查询
- 批准号:
7960025 - 财政年份:2009
- 资助金额:
$ 2.97万 - 项目类别:
USE THE EDIT DISTANCE IN THE ND-TREE FOR EFFICIENT BIOINFORMATICS QUERIES
使用 ND 树中的编辑距离进行高效的生物信息学查询
- 批准号:
7725103 - 财政年份:2008
- 资助金额:
$ 2.97万 - 项目类别:
BULK-LOADING & PERFORMANCE STUDIES OF THE ND-TREE FOR LARGE GENOME DATABASES
散装
- 批准号:
7610287 - 财政年份:2007
- 资助金额:
$ 2.97万 - 项目类别:
相似海外基金
Hyperlink Management System for automated integration of biological databases by use of data IDs
超链接管理系统,通过使用数据 ID 自动集成生物数据库
- 批准号:
25280108 - 财政年份:2013
- 资助金额:
$ 2.97万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Enabling systesm biology through machine learning: intelligent software to automatically summarize and combine large-scale biological databases
通过机器学习赋能系统生物学:智能软件自动汇总并组合大规模生物数据库
- 批准号:
327585-2006 - 财政年份:2008
- 资助金额:
$ 2.97万 - 项目类别:
Discovery Grants Program - Individual
BDI-CIEG- Cyberinfrastructure Experience for Graduate Students In Biological Databases and Informatics: Workshop at UC-San Diego, from July 1 to Sept. 8, 2007.
BDI-CIEG - 研究生生物数据库和信息学网络基础设施体验:2007 年 7 月 1 日至 9 月 8 日在加州大学圣地亚哥分校举办的研讨会。
- 批准号:
0726924 - 财政年份:2007
- 资助金额:
$ 2.97万 - 项目类别:
Standard Grant
Enabling systesm biology through machine learning: intelligent software to automatically summarize and combine large-scale biological databases
通过机器学习赋能系统生物学:智能软件自动汇总并组合大规模生物数据库
- 批准号:
327585-2006 - 财政年份:2007
- 资助金额:
$ 2.97万 - 项目类别:
Discovery Grants Program - Individual
Enabling systesm biology through machine learning: intelligent software to automatically summarize and combine large-scale biological databases
通过机器学习赋能系统生物学:智能软件自动汇总并组合大规模生物数据库
- 批准号:
327585-2006 - 财政年份:2006
- 资助金额:
$ 2.97万 - 项目类别:
Discovery Grants Program - Individual
Collaborative Research: Endowing Biological Databases with Analytical Power: Indexing, Querying, and Mining of Complex Biological Structures
合作研究:赋予生物数据库分析能力:复杂生物结构的索引、查询和挖掘
- 批准号:
0515936 - 财政年份:2005
- 资助金额:
$ 2.97万 - 项目类别:
Standard Grant
Collaborative Research: Endowing Biological Databases With Analytical Power: Indexing, Querying, and Mining of Complex Biological Structures
协作研究:赋予生物数据库分析能力:复杂生物结构的索引、查询和挖掘
- 批准号:
0515813 - 财政年份:2005
- 资助金额:
$ 2.97万 - 项目类别:
Standard Grant
Using Biological Databases to Improve Biodiversity Assessments: New Methods For Geographic-Based Analysis
使用生物数据库改进生物多样性评估:基于地理的分析的新方法
- 批准号:
0109969 - 财政年份:2001
- 资助金额:
$ 2.97万 - 项目类别:
Standard Grant
Construction and Retrieval of Highly Integrated Biological Databases
高度集成的生物数据库的构建和检索
- 批准号:
12208007 - 财政年份:2000
- 资助金额:
$ 2.97万 - 项目类别:
Grant-in-Aid for Scientific Research on Priority Areas
Study on the DNA database and other biological databases
DNA数据库及其他生物数据库的研究
- 批准号:
10041187 - 财政年份:1998
- 资助金额:
$ 2.97万 - 项目类别:
Grant-in-Aid for Scientific Research (A).