权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

PIT-DB: A Resource for Sharing, Annotating and Analysing Translated Genomic Elements

PIT-DB：用于共享、注释和分析翻译基因组元素的资源

基本信息

批准号：
BB/M020118/1
负责人：
Conrad Bessant
金额：
$ 15.64万
依托单位：
Queen Mary University of London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2015
资助国家：
英国
起止时间：
2015 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=BB%2FM020118%2F1
关键词：
PIT DB Resource Sharing Annotating

项目摘要

The publication of the human genome in 2001 was rightly hailed as a major scientific achievement, but over a decade later we are still far from a complete understanding of the structure of the genome and the role of the various elements within it. While protein coding regions of the genome were identified and used to annotate the genome soon after it was sequenced, many more exotic genomic elements have subsequently attracted interest including pseudogenes, non-coding RNAs and short open reading frames (sORFs).In recent years, post-genomic bioanalytical techniques such as RNA-seq transcriptomics (which tells us which genomic elements are expressed) and mass spectrometry based proteomics (which tells us which of the expressed elements are translated into peptides or proteins) have helped refine our understanding of the human genome at a fundamental level. Just this year, two proteomics studies published in Nature caused a stir by showing that no experimental evidence could be found for the expression of several genomic elements widely accepted to code for protein, while other regions of the genome that were not previously thought to be protein coding were in fact found to produce proteins. If this is the situation for the intensively studied human genome, we must assume that the genome annotations for less studied species (so called non-model organisms) are even less accurate.We recently developed (and tested, and published) a methodology called proteomics informed by transcriptomics (PIT) that rapidly generates large numbers of genome annotations underpinned by multiple sources of experimental evidence. In PIT, every sample is analysed using both RNA-seq and proteomic mass spectrometry and the data from these two analyses integrated to provide a list of observed proteins and any other translated genomic elements (TGEs), together with the detailed transcriptomic and spectral evidence that underpins these observations. The beauty of PIT compared with traditional proteomics is that no prior sequence knowledge is needed, so novel TGEs (be they proteins or other more exotic features) can be detected. RNA-seq can be used by itself to rapidly generate genome annotations without prior knowledge, but without PIT's mass spectrometry step the confidence in these annotations is limited and there is no guarantee that transcribed elements actually get translated.In a recent BBSRC TRDF project we developed easy to use web-based software workflows, implement in the popular Galaxy platform, to process the data from PIT experiments in a repeatable way with uniformly formatted output files. This has proven very useful for answering individual biological questions, but there is currently no meaningful way to share the results of PIT experiments. In this project we propose to plug this gap by developing PIT-DB, a web-accessible database of results produced by PIT. This publicly available database will immediately be populated with data from experiments conducted on various species at the University of Bristol, but other groups will be actively encouraged to submission their own data.Having data from multiple PIT experiments in one database will deliver exciting new scientific insights. As well as simply allowing researchers to share their results from individual PIT experiments, PIT-DB will pool information about individual novel TGEs from multiple experiments so evidence can be accumulated for each individual TGE. Improving the quality of results by using data from replicate experiments is a fundamental concept in science and the utility of doing this on a community-wide basis has been repeatedly demonstrated by other bioinformatics databases such as Ensembl, UniProt and PRIDE. As well as being of interest individually, the well evidenced TGEs in PIT-DB will provide large numbers of experimentally derived (as opposed to computationally predicted) genome annotations for all of the species for which data is present in the database.

2001年人类基因组的发表被理所当然地誉为一项重大的科学成就，但十多年过去了，我们对基因组的结构和其中各种元素的作用仍远未完全了解。虽然基因组的蛋白质编码区在测序后不久就被识别并用于注释基因组，但许多更多的外来基因组元件随后引起了人们的兴趣，包括假基因、非编码rna和短开放阅读框（sorf）。近年来，后基因组生物分析技术，如RNA-seq转录组学（它告诉我们哪些基因组元件被表达）和基于质谱的蛋白质组学（它告诉我们哪些表达的元件被翻译成肽或蛋白质），已经帮助我们在基础水平上完善了我们对人类基因组的理解。就在今年，发表在《自然》（Nature）杂志上的两项蛋白质组学研究引起了轰动，因为它们表明，没有找到实验证据证明被广泛接受为蛋白质编码的几个基因组元素的表达，而基因组中以前不被认为是蛋白质编码的其他区域实际上被发现可以产生蛋白质。如果这是被深入研究的人类基因组的情况，我们必须假设对研究较少的物种（所谓的非模式生物）的基因组注释更不准确。我们最近开发（并测试并发表）了一种称为转录组学（PIT）的蛋白质组学方法，该方法可以快速生成大量的基因组注释，并得到多种实验证据来源的支持。在PIT中，每个样本都使用RNA-seq和蛋白质组质谱分析，并将这两种分析的数据整合在一起，提供观察到的蛋白质和任何其他翻译基因组元件（TGEs）的列表，以及支持这些观察结果的详细转录组学和光谱证据。与传统的蛋白质组学相比，PIT的优点在于不需要事先了解序列，因此可以检测到新的tge（无论是蛋白质还是其他更奇特的特征）。RNA-seq本身可以在没有先验知识的情况下快速生成基因组注释，但没有PIT的质谱步骤，对这些注释的信心是有限的，并且不能保证转录的元件实际上被翻译。在最近的BBSRC TRDF项目中，我们开发了易于使用的基于web的软件工作流，在流行的Galaxy平台上实现，以可重复的方式处理PIT实验数据，并使用统一格式的输出文件。事实证明，这对于回答个别生物学问题非常有用，但目前还没有有意义的方法来分享PIT实验的结果。在这个项目中，我们建议通过开发PIT- db来填补这一空白，这是一个可以通过网络访问的PIT结果数据库。这个公开可用的数据库将立即被布里斯托尔大学对不同物种进行的实验数据填充，但其他团体将积极鼓励提交他们自己的数据。在一个数据库中拥有来自多个PIT实验的数据将提供令人兴奋的新的科学见解。除了简单地允许研究人员分享他们从单个PIT实验中获得的结果外，PIT- db还将汇集来自多个实验的单个新颖TGE的信息，以便为每个TGE积累证据。通过使用来自重复实验的数据来提高结果的质量是科学中的一个基本概念，其他生物信息学数据库（如Ensembl、UniProt和PRIDE）反复证明了在社区范围内这样做的效用。除了对个体感兴趣外，PIT-DB中充分证明的TGEs将为数据库中存在数据的所有物种提供大量实验衍生（而不是计算预测）的基因组注释。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Novel application of heuristic optimisation enables the creation and thorough evaluation of robust support vector machine ensembles for machine learning applications.

DOI：
10.1007/s11306-015-0894-4
发表时间：
2016
期刊：
Metabolomics : Official journal of the Metabolomic Society
影响因子：
0
作者：
Chatzimichali EA;Bessant C
通讯作者：
Bessant C

Proteomics technique opens new frontiers in mobilome research.

DOI：
10.1080/2159256x.2017.1362494
发表时间：
2017
期刊：
Mobile genetic elements
影响因子：
0
作者：
Davidson AD;Matthews DA;Maringer K
通讯作者：
Maringer K

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti.

蛋白质组学通过转录组学告知，以表征伊蚊中的主动转座元件和基因组注释。

DOI：
10.1186/s12864-016-3432-5
发表时间：
2017-01-19
期刊：
BMC genomics
影响因子：
4.4
作者：
Maringer K;Yousuf A;Heesom KJ;Fan J;Lee D;Fernandez-Sesma A;Bessant C;Matthews DA;Davidson AD
通讯作者：
Davidson AD

PITDB: a database of translated genomic elements.

DOI：
10.1093/nar/gkx906
发表时间：
2018-01-04
期刊：
Nucleic acids research
影响因子：
14.9
作者：
Saha S;Chatzimichali EA;Matthews DA;Bessant C
通讯作者：
Bessant C

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Conrad Bessant其他文献

Deriving Meaningful Aspects of Health Related to Physical Activity in Chronic Disease: Concept Elicitation Using Machine Learning–Assisted Coding of Online Patient Conversations

DOI：
10.1016/j.jval.2023.01.022
发表时间：
2023-07-01
期刊：
Research article
影响因子：
作者：
Bill Byrom;Conrad Bessant;Fabrizio Smeraldi;Maryam Abdollahyan;Yasemin Bridges;Marzana Chowdhury;Asiyya Tahsin
通讯作者：
Asiyya Tahsin