Database on demand - creating customized sequence databases for efficient protein identification

按需数据库 - 创建定制序列数据库以实现高效蛋白质识别

基本信息

  • 批准号:
    BB/F016255/1
  • 负责人:
  • 金额:
    $ 6.16万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2008
  • 资助国家:
    英国
  • 起止时间:
    2008 至 无数据
  • 项目状态:
    已结题

项目摘要

The field of proteomics attempts to identify and characterize the protein complement of cells or tissues. The most popular analytical technique to achieve these goals is mass spectrometry. The mass spectra that are obtained from these instruments are usually identified by comparing them with predicted spectra based on protein sequences from sequence databases. Sophisticated computer algorithms such as the MASCOT search engine (http://www.matrixscience.com), have been developed to automate this particular task in order to accommodate the large amounts of data generated by this approach. Interestingly, only a minor fraction of the acquired spectra can be assigned to known proteins. Since all proteins can potentially go through certain changes during their lifetime in (or outside) the cell, the search algorithms are built to take certain changes into account. Mass differences based on the addition of so-called posttranslational modifications (e.g.: phosphorylation) are usually optionally taken into account by these search engines. Unfortunately, proteolytic cleavage, another biologically relevant form of protein processing, is not taken into consideration. The biological relevance of cleavage events is exemplified by the fact that many proteins found in the circulatory system (e.g.: in plasma or serum) show signs of proteolytic degradation. The cleavage patterns that these proteins or their fragments carry are hypothesized to have great significance as biomarkers for abnormal processes in the body at large. The ability to reliably and quickly identify such degradation products can thus serve an important role in the early detection of disease. Another point that is often overlooked by search engines concerns common contaminants found in samples / from the pig trypsin that is used to digest the samples to mycobacterial or viral infection of the cell lines under study. Finally, the occurence of sequence variations (through splice variants or single aminoacid polymorphisms) can further confound the identification process. Research of the frequency and importance of these minor sequence variations is therefore not straightforward. It is clear from the above that the spectra that elude identification for this reason are of great biological interest. It is also clear that the tools for reliably identifying such spectra are available, given that they can match the spectrum against the correct sequence. In order to furnish these algorithms with an enhanced set of sequences against which to match the acquired mass spectra, simple pre-processing steps of the original sequence database suffice. In this project, we propose to develop a tool that will allow the user to obtain such a customized, enriched sequence database. The user will be able to specify (a combination of) pre-processing steps that should be applied to the sequence database on a user-friendly web form. The software will subsequently take care of generating the corresponding database and format it in such a way that it can readily be used in search engines such as MASCOT. The user will simply need to download the generated database by following a web link upon notification of the completion of the process and upload this database into MASCOT (or any other search engine). The software will be a highly modular layer between the user and the sequence database that will enable preprocessing steps suited for current-day proteomics analyses, and will be easily extensible for future requirements from the community. This simple step of enriching the sequence database against which mass spectra are matched, will enhance the identification efficiency of current research projects (as well as enabling the re-analysis of previous efforts) and has the potential to unlock novel and highly interesting biological findings. As such, the tool holds great promise as a means to raise the value-for-money of proteomics experiments, while at the same time expanding the reach of the field.
蛋白质组学领域试图识别和表征细胞或组织的蛋白质补充。实现这些目标的最流行的分析技术是质谱学。从这些仪器获得的质谱图通常通过将它们与基于来自序列数据库的蛋白质序列的预测谱相比较来鉴定。已经开发了诸如吉祥物搜索引擎(http://www.matrixscience.com),)之类的复杂计算机算法来自动执行这一特定任务,以便适应由该方法产生的大量数据。有趣的是,获得的光谱中只有一小部分可以归属于已知的蛋白质。由于所有蛋白质在细胞内(或细胞外)的生命周期中都可能经历特定的变化,因此搜索算法被构建为考虑特定的变化。基于添加所谓的翻译后修饰(例如:磷酸化)的质量差异通常被这些搜索引擎选择性地考虑在内。不幸的是,蛋白质分解,另一种与生物相关的蛋白质加工形式,没有被考虑在内。在循环系统中发现的许多蛋白质(例如,在血浆或血清中)都显示出蛋白质降解的迹象,这一事实证明了切割事件的生物学相关性。这些蛋白质或其片段携带的切割模式被假设为对整个身体中异常过程的生物标志物具有重要意义。因此,能够可靠和快速地识别这种降解产物可以在疾病的早期发现中发挥重要作用。另一点经常被搜索引擎忽视的问题是,在样品中/从用于消化样品的猪胰酶中发现的常见污染物,以及被研究的细胞系的分枝杆菌或病毒感染。最后,序列变异(通过剪接变异或单一氨基酸多态)的出现可能会进一步扰乱鉴定过程。因此,研究这些微小序列变异的频率和重要性并非易事。从上面可以清楚地看出,由于这个原因而不能被识别的光谱具有很大的生物学意义。同样清楚的是,如果可以将光谱与正确的序列进行匹配,就可以获得可靠地识别这种光谱的工具。为了向这些算法提供一组增强的序列,以对照所获得的质谱图进行匹配,原始序列数据库的简单的预处理步骤就足够了。在这个项目中,我们建议开发一个工具,允许用户获得这样一个定制的、丰富的序列数据库。用户将能够以用户友好的网络表格指定应应用于序列数据库的(组合)前处理步骤。该软件随后将负责生成相应的数据库,并将其格式化,以便在搜索引擎(如吉祥物)中容易使用。用户只需在收到该过程完成的通知后,通过Web链接下载所生成的数据库,并将该数据库上传到吉祥物(或任何其他搜索引擎)。该软件将是用户和序列数据库之间的一个高度模块化的层,它将实现适合当前蛋白质组分析的前处理步骤,并将很容易扩展,以满足社区未来的需求。这一简单的步骤是丰富与质谱图匹配的序列数据库,将提高当前研究项目的鉴定效率(以及使以前的努力能够重新分析),并有可能解锁新的和非常有趣的生物学发现。因此,该工具作为一种手段,在提高蛋白质组学实验的性价比的同时,扩大了该领域的覆盖范围,前景广阔。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rolf Apweiler其他文献

In Silico Characterization of Proteins: UniProt, InterPro and Integr8
  • DOI:
    10.1007/s12033-007-9003-x
  • 发表时间:
    2007-10-04
  • 期刊:
  • 影响因子:
    2.500
  • 作者:
    Nicola Jane Mulder;Paul Kersey;Manuela Pruess;Rolf Apweiler
  • 通讯作者:
    Rolf Apweiler
Linking publication, gene and protein data
链接出版物、基因和蛋白质数据
  • DOI:
    10.1038/ncb1495
  • 发表时间:
    2006-11-01
  • 期刊:
  • 影响因子:
    19.100
  • 作者:
    Paul Kersey;Rolf Apweiler
  • 通讯作者:
    Rolf Apweiler
Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions
  • DOI:
    10.1186/1741-7007-5-44
  • 发表时间:
    2007-10-09
  • 期刊:
  • 影响因子:
    4.500
  • 作者:
    Samuel Kerrien;Sandra Orchard;Luisa Montecchi-Palazzi;Bruno Aranda;Antony F Quinn;Nisha Vinod;Gary D Bader;Ioannis Xenarios;Jérôme Wojcik;David Sherman;Mike Tyers;John J Salama;Susan Moore;Arnaud Ceol;Andrew Chatr-aryamontri;Matthias Oesterheld;Volker Stümpflen;Lukasz Salwinski;Jason Nerothin;Ethan Cerami;Michael E Cusick;Marc Vidal;Michael Gilson;John Armstrong;Peter Woollard;Christopher Hogue;David Eisenberg;Gianni Cesareni;Rolf Apweiler;Henning Hermjakob
  • 通讯作者:
    Henning Hermjakob
Whither systems medicine?
系统医学何去何从?
  • DOI:
    10.1038/emm.2017.290
  • 发表时间:
    2018-03-02
  • 期刊:
  • 影响因子:
    12.900
  • 作者:
    Rolf Apweiler;Tim Beissbarth;Michael R Berthold;Nils Blüthgen;Yvonne Burmeister;Olaf Dammann;Andreas Deutsch;Friedrich Feuerhake;Andre Franke;Jan Hasenauer;Steve Hoffmann;Thomas Höfer;Peter LM Jansen;Lars Kaderali;Ursula Klingmüller;Ina Koch;Oliver Kohlbacher;Lars Kuepfer;Frank Lammert;Dieter Maier;Nico Pfeifer;Nicole Radde;Markus Rehm;Ingo Roeder;Julio Saez-Rodriguez;Ulrich Sax;Bernd Schmeck;Andreas Schuppert;Bernd Seilheimer;Fabian J Theis;Julio Vera;Olaf Wolkenhauer
  • 通讯作者:
    Olaf Wolkenhauer

Rolf Apweiler的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rolf Apweiler', 18)}}的其他基金

ARGENT: ARgentinian GEnomics for Tuberculosis
ARGENT:阿根廷结核病基因组学
  • 批准号:
    EP/T015446/1
  • 财政年份:
    2019
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Research Grant
Embracing new technologies to streamline improve and sustain InterPro and its contributing databases
采用新技术来简化、改进和维护 InterPro 及其贡献数据库
  • 批准号:
    BB/F010508/1
  • 财政年份:
    2008
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Research Grant
Further development of the QuickGO web interface for browsing and retrieving Gene Ontology Annotation data
进一步开发 QuickGO Web 界面,用于浏览和检索基因本体注释数据
  • 批准号:
    BB/E023541/1
  • 财政年份:
    2007
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Research Grant
ProteomeHarvest - Excel/XML Bridge for User-friendly Proteomics Data Collection
ProteomeHarvest - Excel/XML 桥接器,用于用户友好的蛋白质组学数据收集
  • 批准号:
    BB/E00573X/1
  • 财政年份:
    2006
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Research Grant

相似国自然基金

EstimatingLarge Demand Systems with MachineLearning Techniques
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国学者研究基金
“on-demand”释银的双响应性水凝胶体系治疗糖尿病牙周炎的作用机制探究
  • 批准号:
    82301140
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Algorithmic Foundations for Demand-Responsive Transit Systems - Creating More Equitable and Sustainable Cities through Better Transit
职业:需求响应型交通系统的算法基础 - 通过更好的交通创建更加公平和可持续的城市
  • 批准号:
    2144127
  • 财政年份:
    2022
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Continuing Grant
Research on consumers' food demand for creating strong Japanese agricultural structure using machine learning methods
利用机器学习方法研究消费者对创建强大的日本农业结构的食品需求
  • 批准号:
    21K05795
  • 财政年份:
    2021
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Creating virtual demand response assets using predictive modelling, with special application to Thailand
使用预测模型创建虚拟需求响应资产,特别适用于泰国
  • 批准号:
    133941
  • 财政年份:
    2020
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Collaborative R&D
BECKON - Block Estimate Chain: creating Knowledge ON demand & protecting privacy
BECKON - 区块估算链:按需创建知识
  • 批准号:
    10133117
  • 财政年份:
    2019
  • 资助金额:
    $ 6.16万
  • 项目类别:
BECKON - Block Estimate Chain: creating Knowledge ON demand & protecting privacy
BECKON - 区块估算链:按需创建知识
  • 批准号:
    9920181
  • 财政年份:
    2019
  • 资助金额:
    $ 6.16万
  • 项目类别:
Creating virtual demand response assets using predictive modelling, with special application to Thailand
使用预测模型创建虚拟需求响应资产,特别适用于泰国
  • 批准号:
    132960
  • 财政年份:
    2018
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Feasibility Studies
BECKON - Block Estimate Chain: creating Knowledge ON demand & protecting privacy
BECKON - 区块估算链:按需创建知识
  • 批准号:
    9371707
  • 财政年份:
    2017
  • 资助金额:
    $ 6.16万
  • 项目类别:
Development and utilization of integrated paper devices for creating demand for wood to restore Japanese forestry
开发和利用综合造纸设备创造木材需求以恢复日本林业
  • 批准号:
    17KT0069
  • 财政年份:
    2017
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Shrinking the food-print by creating consumer demand for sustainable and healthy eating
通过创造消费者对可持续和健康饮食的需求来减少食品消耗
  • 批准号:
    DP130102820
  • 财政年份:
    2013
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Discovery Projects
New Horizon of Travel Demand Analysis and Its Applications to Creating More Travel Demand
出行需求分析的新视野及其在创造更多出行需求中的应用
  • 批准号:
    09630107
  • 财政年份:
    1997
  • 资助金额:
    $ 6.16万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了