权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ProteomeHarvest - Excel/XML Bridge for User-friendly Proteomics Data Collection

ProteomeHarvest - Excel/XML 桥接器，用于用户友好的蛋白质组学数据收集

基本信息

批准号：
BB/E00573X/1
负责人：
Rolf Apweiler
金额：
$ 6.4万
依托单位：
European Bioinformatics Institute
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2006
资助国家：
英国
起止时间：
2006 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=BB%2FE00573X%2F1
关键词：
ProteomeHarvest Excel XML Bridge User

项目摘要

Today, scientific experiments in molecular biology in general and in proteomics in particular, are often done on a large scale, producing large numbers of individual data items. These large data sets are then the basis of scientific publications. Often only relatively few results are actually contributing to the final conclusions reached by the researcher, but the complete result sets can provide valuable knowledge to other researchers comparing them to their own results. However, to allow others to understand how the experiments were done, they need to be described in a very detailed manner. To avoid 'comparing apples and pears', this discription needs to be done in a systematic manner, using established rules or standards on how to describe experiments. In addition, the data needs to be easily accessible for other researchers, which can be best achieved by entering it into large databases, accessible over the internet. Overall, a lot of effort is needed to describe a large experiment in the detailed, standardised manner which allows others to understand them. In other projects, we are working on setting up common rules for the description of proteomics experiments. However, even the best rules are useless if they are not applied. As scientists, like everybody else, tend to do only the minimum amount of work to achieve their goals, their experiment description is often incomplete, focussing only on the aspects they consider relevant. And of course they tend to quickly tire of properly entering the data into databases if they have to use complicated tools they have to install on their computer, and with which they are not familiar. On the other hand, there are programs they know well, because the use them almost every day anyway to manage their data. The main purpose of this proposal is to use one such tool, Microsoft Excel, to develop forms which allow scientists to enter their results into a database in as easy a manner as possible. Biologists are used to Excel, they are familiar with its functionality, and they nearly always have it installed on their computer anyway. We plan to develop Excel forms which are as user friendly as possible, but still capture all the necessary data to appropriately describe the results of a large experiment, according to established rules and standards. While Excel is often used to store experiment results, this is often done in a very unsystematic manner, and it is usually very difficult to transfer the data into XML, a file format which is nowadays practically the standard way for entering data into databases. Also, so far it has been difficult to use and regularly update controlled vocabularies in Excel. Controlled vocabularies are lists of possible words which can be entered in a specific field in a form, to avoid typing errors, and to ensure everybody uses the same word for the same thing. In this project, we propose to develop advanced Excel forms for proteomics data harvesting. These forms should provide researchers with an easy tool to store their data in a systematic manner, ready for sending it to a database. These forms will be able to communicate with a database on the internet to provide up-to-date controlled vocabularies, and they will be able to directly send the data in the form of XML to a database on the internet. We will develop and test these forms for the existing PRIDE proteomics database, making use of the existing database for data storage, and using OLS, the ontology lookup service developed as part of PRIDE, to keep controlled vocabularies in the Excel forms up to date. By providing Excel forms as a user-friendly way to store proteomics data and send it to public databases, we hope to convince researchers to invest a little bit of extra effort to make their valuable data accessible to their collegues by sending it to public databases, and thus to maximise the use of data paid for by the tax payer anyway.

今天，分子生物学的科学实验，特别是蛋白质组学的科学实验，通常是在大规模上进行的，产生大量的个人数据项。这些大型数据集是科学出版物的基础。通常只有相对较少的结果对研究人员得出的最终结论有贡献，但完整的结果集可以为其他研究人员提供有价值的知识，将它们与自己的结果进行比较。然而，为了让其他人理解实验是如何完成的，他们需要以非常详细的方式描述。为了避免“比较苹果和梨”，这种描述需要以一种系统的方式进行，使用关于如何描述实验的既定规则或标准。此外，这些数据需要便于其他研究人员访问，最好的方法是将其输入大型数据库，并通过互联网访问。总的来说，以详细、标准化的方式描述一个大型实验是需要付出很多努力的，这样才能让其他人理解它们。在其他项目中，我们正致力于建立描述蛋白质组学实验的通用规则。然而，即使是最好的规则，如果不加以应用也是无用的。由于科学家和其他人一样，倾向于只做最少的工作来实现他们的目标，他们的实验描述往往是不完整的，只关注他们认为相关的方面。当然，如果他们必须在自己的计算机上安装复杂的工具，而且他们不熟悉这些工具，他们往往很快就会厌倦正确地将数据输入数据库。另一方面，有些程序他们很熟悉，因为他们几乎每天都在使用它们来管理他们的数据。这项提议的主要目的是使用这样一种工具，微软Excel，来开发表格，使科学家能够以尽可能简单的方式将他们的结果输入数据库。生物学家已经习惯了Excel，他们熟悉它的功能，而且他们几乎总是把它安装在他们的电脑上。我们计划开发尽可能用户友好的Excel表格，但仍然根据既定的规则和标准捕获所有必要的数据，以适当地描述大型实验的结果。虽然通常使用Excel来存储实验结果，但这通常是以一种非常不系统的方式完成的，并且通常很难将数据转换为XML，而XML是目前将数据输入数据库的实际标准方式。此外，到目前为止，在Excel中使用和定期更新受控词汇表一直很困难。受控词汇表是可以在表单的特定字段中输入的可能单词列表，以避免输入错误，并确保每个人都使用相同的单词来表示相同的事物。在这个项目中，我们建议开发用于蛋白质组学数据收集的高级Excel表格。这些表格应该为研究人员提供一个简单的工具，以系统的方式存储他们的数据，准备将其发送到数据库。这些表单将能够与internet上的数据库通信，以提供最新的受控词汇表，并且它们将能够直接将XML形式的数据发送到internet上的数据库。我们将为现有的PRIDE蛋白质组学数据库开发和测试这些表单，利用现有的数据库进行数据存储，并使用OLS（作为PRIDE的一部分开发的本体查找服务）来保持Excel表单中的受控词汇表的更新。通过提供Excel表格作为一种用户友好的方式来存储蛋白质组学数据并将其发送到公共数据库，我们希望说服研究人员投入一点额外的努力，通过将其发送到公共数据库，使他们的同事可以访问他们有价值的数据，从而最大限度地利用纳税人支付的数据。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Rolf Apweiler其他文献

In Silico Characterization of Proteins: UniProt, InterPro and Integr8

DOI：
10.1007/s12033-007-9003-x
发表时间：
2007-10-04
期刊：
MOLECULAR BIOTECHNOLOGY
影响因子：
2.500
作者：
Nicola Jane Mulder;Paul Kersey;Manuela Pruess;Rolf Apweiler
通讯作者：
Rolf Apweiler

Linking publication, gene and protein data

链接出版物、基因和蛋白质数据

DOI：
10.1038/ncb1495
发表时间：
2006-11-01
期刊：
NATURE CELL BIOLOGY
影响因子：
19.100
作者：
Paul Kersey;Rolf Apweiler
通讯作者：
Rolf Apweiler

Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions

DOI：
10.1186/1741-7007-5-44
发表时间：
2007-10-09
期刊：
BMC BIOLOGY
影响因子：
4.500
作者：
Samuel Kerrien;Sandra Orchard;Luisa Montecchi-Palazzi;Bruno Aranda;Antony F Quinn;Nisha Vinod;Gary D Bader;Ioannis Xenarios;Jérôme Wojcik;David Sherman;Mike Tyers;John J Salama;Susan Moore;Arnaud Ceol;Andrew Chatr-aryamontri;Matthias Oesterheld;Volker Stümpflen;Lukasz Salwinski;Jason Nerothin;Ethan Cerami;Michael E Cusick;Marc Vidal;Michael Gilson;John Armstrong;Peter Woollard;Christopher Hogue;David Eisenberg;Gianni Cesareni;Rolf Apweiler;Henning Hermjakob
通讯作者：
Henning Hermjakob

Whither systems medicine?

系统医学何去何从？

DOI：
10.1038/emm.2017.290
发表时间：
2018-03-02
期刊：
EXPERIMENTAL AND MOLECULAR MEDICINE
影响因子：
12.900
作者：
Rolf Apweiler;Tim Beissbarth;Michael R Berthold;Nils Blüthgen;Yvonne Burmeister;Olaf Dammann;Andreas Deutsch;Friedrich Feuerhake;Andre Franke;Jan Hasenauer;Steve Hoffmann;Thomas Höfer;Peter LM Jansen;Lars Kaderali;Ursula Klingmüller;Ina Koch;Oliver Kohlbacher;Lars Kuepfer;Frank Lammert;Dieter Maier;Nico Pfeifer;Nicole Radde;Markus Rehm;Ingo Roeder;Julio Saez-Rodriguez;Ulrich Sax;Bernd Schmeck;Andreas Schuppert;Bernd Seilheimer;Fabian J Theis;Julio Vera;Olaf Wolkenhauer
通讯作者：
Olaf Wolkenhauer