权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

UTILIZING TERAGRID TO DETECT REMOTE SIMILARITY PROTEIN SEQUENCES

利用 teragrid 检测远程相似性蛋白质序列

基本信息

批准号：
7723381
负责人：
MARK FIENUP
金额：
$ 0.05万
依托单位：
CARNEGIE-MELLON UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2008
资助国家：
美国
起止时间：
2008-08-01 至 2009-07-31
项目状态：
已结题

项目摘要

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The structure of a protein is often a key to its function. However, significant time and cost is required to determine the structure of a protein by experimental methods, such as the X-ray crystallography or the Nuclear Magnetic Resonance. There are currently less than 50,000 protein structures deposited in the Protein Data Bank (PDB), of which about 80% are redundant. On the other hand, the genomic sequencing efforts, such as the Human Genome Project, have populated protein sequence databases with well over 5 million sequences. With the increasing gap between known sequences and experimentally determined structures, the computational methods capable of predicting the structure and function of proteins will play an increasing role in protein annotation studies. The ultimate goal of the research described in this proposal is to develop a new protein sequence homology detection method that leverages the growing body of protein sequence data in ways that existing methods do not. The increased sensitivity in recognizing relationships between amino acid sequences will be achieved through the applications of intermediate sequence search strategies and profile-profile techniques. To date, the progress in this area has been limited by the lack of the computational resources needed to perform the transitive profile-profile search. We propose to utilize the TeraGrid to develop and test the first intermediate profile-profile algorithm for detecting protein sequence similarities. The algorithm constructs a sequential profile for the input amino acid sequence (target) and uses it to transitively search the database of all representative profiles for sequences in nr. In the transitive search, the matches found after running the first sequence comparison are used as new queries against the database. The whole process is repeated, iteratively with these new matches. The similarity between the target profile and the profile from the database is established through the intermediate sequences. Our project will be carried out in two stages: 1. In the first stage we will generate the set of representative alignment profiles for sequences from the non-redundant protein sequence database nr. 2. In the second phase we will deploy and test our algorithm.

该子项目是利用该技术的众多研究子项目之一资源由 NIH/NCRR 资助的中心拨款提供。子项目和研究者 (PI) 可能已从 NIH 的另一个来源获得主要资金，因此可以在其他 CRISP 条目中表示。列出的机构是对于中心来说，它不一定是研究者的机构。蛋白质的结构通常是其功能的关键。然而，通过X射线晶体学或核磁共振等实验方法确定蛋白质的结构需要大量的时间和成本。目前蛋白质数据库（PDB）中存有不到 50,000 个蛋白质结构，其中约 80% 是冗余的。另一方面，人类基因组计划等基因组测序工作已经在蛋白质序列数据库中填充了超过 500 万条序列。随着已知序列和实验确定的结构之间的差距越来越大，能够预测蛋白质结构和功能的计算方法将在蛋白质注释研究中发挥越来越重要的作用。该提案中描述的研究的最终目标是开发一种新的蛋白质序列同源性检测方法，以现有方法无法做到的方式利用不断增长的蛋白质序列数据。通过中间序列搜索策略和图谱技术的应用，可以提高识别氨基酸序列之间关系的灵敏度。迄今为止，由于缺乏执行传递轮廓搜索所需的计算资源，该领域的进展受到限制。我们建议利用 TeraGrid 开发和测试第一个用于检测蛋白质序列相似性的中间轮廓-轮廓算法。该算法为输入的氨基酸序列（目标）构建一个序列图谱，并使用它来传递性地搜索所有代表性图谱的数据库以查找 nr 中的序列。在传递搜索中，运行第一次序列比较后找到的匹配项将用作针对数据库的新查询。整个过程会通过这些新匹配迭代地重复进行。目标图谱与数据库中的图谱之间的相似性是通过中间序列建立的。我们的项目将分两个阶段进行： 1. 在第一阶段，我们将为来自非冗余蛋白质序列数据库 nr 的序列生成一组代表性的比对图谱。 2. 在第二阶段，我们将部署并测试我们的算法。