权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Database on demand - creating customized sequence databases for efficient protein identification

按需数据库 - 创建定制序列数据库以实现高效蛋白质识别

基本信息

批准号：
BB/F016255/1
负责人：
Rolf Apweiler
金额：
$ 6.16万
依托单位：
European Bioinformatics Institute
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2008
资助国家：
英国
起止时间：
2008 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=BB%2FF016255%2F1
关键词：
Database demand creating customized sequence

项目摘要

The field of proteomics attempts to identify and characterize the protein complement of cells or tissues. The most popular analytical technique to achieve these goals is mass spectrometry. The mass spectra that are obtained from these instruments are usually identified by comparing them with predicted spectra based on protein sequences from sequence databases. Sophisticated computer algorithms such as the MASCOT search engine (http://www.matrixscience.com), have been developed to automate this particular task in order to accommodate the large amounts of data generated by this approach. Interestingly, only a minor fraction of the acquired spectra can be assigned to known proteins. Since all proteins can potentially go through certain changes during their lifetime in (or outside) the cell, the search algorithms are built to take certain changes into account. Mass differences based on the addition of so-called posttranslational modifications (e.g.: phosphorylation) are usually optionally taken into account by these search engines. Unfortunately, proteolytic cleavage, another biologically relevant form of protein processing, is not taken into consideration. The biological relevance of cleavage events is exemplified by the fact that many proteins found in the circulatory system (e.g.: in plasma or serum) show signs of proteolytic degradation. The cleavage patterns that these proteins or their fragments carry are hypothesized to have great significance as biomarkers for abnormal processes in the body at large. The ability to reliably and quickly identify such degradation products can thus serve an important role in the early detection of disease. Another point that is often overlooked by search engines concerns common contaminants found in samples / from the pig trypsin that is used to digest the samples to mycobacterial or viral infection of the cell lines under study. Finally, the occurence of sequence variations (through splice variants or single aminoacid polymorphisms) can further confound the identification process. Research of the frequency and importance of these minor sequence variations is therefore not straightforward. It is clear from the above that the spectra that elude identification for this reason are of great biological interest. It is also clear that the tools for reliably identifying such spectra are available, given that they can match the spectrum against the correct sequence. In order to furnish these algorithms with an enhanced set of sequences against which to match the acquired mass spectra, simple pre-processing steps of the original sequence database suffice. In this project, we propose to develop a tool that will allow the user to obtain such a customized, enriched sequence database. The user will be able to specify (a combination of) pre-processing steps that should be applied to the sequence database on a user-friendly web form. The software will subsequently take care of generating the corresponding database and format it in such a way that it can readily be used in search engines such as MASCOT. The user will simply need to download the generated database by following a web link upon notification of the completion of the process and upload this database into MASCOT (or any other search engine). The software will be a highly modular layer between the user and the sequence database that will enable preprocessing steps suited for current-day proteomics analyses, and will be easily extensible for future requirements from the community. This simple step of enriching the sequence database against which mass spectra are matched, will enhance the identification efficiency of current research projects (as well as enabling the re-analysis of previous efforts) and has the potential to unlock novel and highly interesting biological findings. As such, the tool holds great promise as a means to raise the value-for-money of proteomics experiments, while at the same time expanding the reach of the field.

蛋白质组学领域试图识别和表征细胞或组织的蛋白质补充。实现这些目标的最流行的分析技术是质谱学。从这些仪器获得的质谱图通常通过将它们与基于来自序列数据库的蛋白质序列的预测谱相比较来鉴定。已经开发了诸如吉祥物搜索引擎(http://www.matrixscience.com)，)之类的复杂计算机算法来自动执行这一特定任务，以便适应由该方法产生的大量数据。有趣的是，获得的光谱中只有一小部分可以归属于已知的蛋白质。由于所有蛋白质在细胞内(或细胞外)的生命周期中都可能经历特定的变化，因此搜索算法被构建为考虑特定的变化。基于添加所谓的翻译后修饰(例如：磷酸化)的质量差异通常被这些搜索引擎选择性地考虑在内。不幸的是，蛋白质分解，另一种与生物相关的蛋白质加工形式，没有被考虑在内。在循环系统中发现的许多蛋白质(例如，在血浆或血清中)都显示出蛋白质降解的迹象，这一事实证明了切割事件的生物学相关性。这些蛋白质或其片段携带的切割模式被假设为对整个身体中异常过程的生物标志物具有重要意义。因此，能够可靠和快速地识别这种降解产物可以在疾病的早期发现中发挥重要作用。另一点经常被搜索引擎忽视的问题是，在样品中/从用于消化样品的猪胰酶中发现的常见污染物，以及被研究的细胞系的分枝杆菌或病毒感染。最后，序列变异(通过剪接变异或单一氨基酸多态)的出现可能会进一步扰乱鉴定过程。因此，研究这些微小序列变异的频率和重要性并非易事。从上面可以清楚地看出，由于这个原因而不能被识别的光谱具有很大的生物学意义。同样清楚的是，如果可以将光谱与正确的序列进行匹配，就可以获得可靠地识别这种光谱的工具。为了向这些算法提供一组增强的序列，以对照所获得的质谱图进行匹配，原始序列数据库的简单的预处理步骤就足够了。在这个项目中，我们建议开发一个工具，允许用户获得这样一个定制的、丰富的序列数据库。用户将能够以用户友好的网络表格指定应应用于序列数据库的(组合)前处理步骤。该软件随后将负责生成相应的数据库，并将其格式化，以便在搜索引擎(如吉祥物)中容易使用。用户只需在收到该过程完成的通知后，通过Web链接下载所生成的数据库，并将该数据库上传到吉祥物(或任何其他搜索引擎)。该软件将是用户和序列数据库之间的一个高度模块化的层，它将实现适合当前蛋白质组分析的前处理步骤，并将很容易扩展，以满足社区未来的需求。这一简单的步骤是丰富与质谱图匹配的序列数据库，将提高当前研究项目的鉴定效率(以及使以前的努力能够重新分析)，并有可能解锁新的和非常有趣的生物学发现。因此，该工具作为一种手段，在提高蛋白质组学实验的性价比的同时，扩大了该领域的覆盖范围，前景广阔。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Rolf Apweiler其他文献

In Silico Characterization of Proteins: UniProt, InterPro and Integr8

DOI：
10.1007/s12033-007-9003-x
发表时间：
2007-10-04
期刊：
MOLECULAR BIOTECHNOLOGY
影响因子：
2.500
作者：
Nicola Jane Mulder;Paul Kersey;Manuela Pruess;Rolf Apweiler
通讯作者：
Rolf Apweiler

Linking publication, gene and protein data

链接出版物、基因和蛋白质数据

DOI：
10.1038/ncb1495
发表时间：
2006-11-01
期刊：
NATURE CELL BIOLOGY
影响因子：
19.100
作者：
Paul Kersey;Rolf Apweiler
通讯作者：
Rolf Apweiler

Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions

DOI：
10.1186/1741-7007-5-44
发表时间：
2007-10-09
期刊：
BMC BIOLOGY
影响因子：
4.500
作者：
Samuel Kerrien;Sandra Orchard;Luisa Montecchi-Palazzi;Bruno Aranda;Antony F Quinn;Nisha Vinod;Gary D Bader;Ioannis Xenarios;Jérôme Wojcik;David Sherman;Mike Tyers;John J Salama;Susan Moore;Arnaud Ceol;Andrew Chatr-aryamontri;Matthias Oesterheld;Volker Stümpflen;Lukasz Salwinski;Jason Nerothin;Ethan Cerami;Michael E Cusick;Marc Vidal;Michael Gilson;John Armstrong;Peter Woollard;Christopher Hogue;David Eisenberg;Gianni Cesareni;Rolf Apweiler;Henning Hermjakob
通讯作者：
Henning Hermjakob

Whither systems medicine?

系统医学何去何从？

DOI：
10.1038/emm.2017.290
发表时间：
2018-03-02
期刊：
EXPERIMENTAL AND MOLECULAR MEDICINE
影响因子：
12.900
作者：
Rolf Apweiler;Tim Beissbarth;Michael R Berthold;Nils Blüthgen;Yvonne Burmeister;Olaf Dammann;Andreas Deutsch;Friedrich Feuerhake;Andre Franke;Jan Hasenauer;Steve Hoffmann;Thomas Höfer;Peter LM Jansen;Lars Kaderali;Ursula Klingmüller;Ina Koch;Oliver Kohlbacher;Lars Kuepfer;Frank Lammert;Dieter Maier;Nico Pfeifer;Nicole Radde;Markus Rehm;Ingo Roeder;Julio Saez-Rodriguez;Ulrich Sax;Bernd Schmeck;Andreas Schuppert;Bernd Seilheimer;Fabian J Theis;Julio Vera;Olaf Wolkenhauer
通讯作者：
Olaf Wolkenhauer