An automated pipeline for construction of Reference Transcript Datasets (RTD) to enable rapid and accurate gene expression analysis in plant species
用于构建参考转录数据集 (RTD) 的自动化管道,可实现植物物种中快速、准确的基因表达分析
基本信息
- 批准号:BB/S020160/1
- 负责人:
- 金额:$ 40.38万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2019
- 资助国家:英国
- 起止时间:2019 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A gene is the basic physical and functional unit on the genome. Genes are turned off and on at different times of development and in response to external and internal signals. Protein-coding genes are copied (transcribed) into precursor messenger RNA (pre-mRNA) which are then processed in different ways into mRNAs which can then be translated into proteins. A goal of the biological research is to understand how genes work by measuring changes in gene expression. This is achieved by estimating the abundances of all of the transcripts produced at any particular time or condition. The current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. The two main platforms are Illumina which generates short reads (currently 75 to 250 bp) and PacBio/Nanopore single molecule sequencing which produces full-length transcript reads. To measure gene expression, Illumina short reads are often mapped to the genome and assembled into transcripts which is an inaccurate process. PacBio/Nanopore have high sequencing error rates and do not generate sufficient depth of coverage of genes. These technologies, both in terms of chemistry and computational analyses, continue to advance at a rapid pace but a combination of the platforms is currently the best approach to generate RNA-seq data. In addition, the fastest and most accurate programs for computational quantification of transcript and gene expression require a comprehensive catalogue of transcripts which we call a Reference Transcript Dataset (RTD). Over the last four years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short read sequences. Through a series of iterations, we developed the computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. The challenge now is to translate this knowledge and experience to other plant and crop (and animal) species. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis procedures have produced large numbers of false transcripts. From developing AtRTD2, we have a prototype pipeline for constructing an RTD. The key features are multiple quality control filters which remove mis-assembled transcripts, redundant transcripts, chimaeric transcripts and transcript fragments. These multiple, iterative steps are currently individually coded and while the pipeline can be used, it will take up to 12 months to generate an RTD and requires the full-time expertise of a bioinformatician. We will develop a fully automated pipeline (RTDBox) which can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a pipeline would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite (TES) which will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make a user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.
基因是基因组上的基本物理和功能单位。基因在发育的不同时期以及对外部和内部信号的反应中关闭和打开。蛋白质编码基因被复制(转录)成前体信使RNA(前mRNA),然后以不同的方式加工成mRNA,然后可以翻译成蛋白质。生物学研究的一个目标是通过测量基因表达的变化来了解基因如何工作。这是通过估计在任何特定时间或条件下产生的所有转录本的丰度来实现的。目前测量基因和转录本表达的技术被称为RNA测序(RNA-seq),通过对数百万个转录本进行测序,可以在全基因组范围内测量RNA水平。两个主要平台是Illumina,它产生短读段(目前为75至250 bp)和PacBio/Nanopore单分子测序,它产生全长转录本读段。为了测量基因表达,通常将Illumina短读段映射到基因组并组装成转录物,这是一个不准确的过程。PacBio/Nanopore具有高测序错误率,并且不能产生足够的基因覆盖深度。这些技术在化学和计算分析方面都在继续快速发展,但这些平台的组合是目前生成RNA-seq数据的最佳方法。此外,用于转录和基因表达的计算定量的最快和最准确的程序需要一个全面的转录本目录,我们称之为参考转录本数据集(RTD)。在过去的四年中,我们开发了一个RTD拟南芥(AtRTD 2)的基础上广泛的Illumina短读序列。通过一系列的迭代,我们开发了计算方法来识别和保留高置信度的成绩单,同时删除虚假成绩单。AtRTD 2大大提高了定量的准确性,例如,允许鉴定响应于冷的新转录和剪接因子。现在的挑战是将这些知识和经验转化为其他植物和作物(和动物)物种。目前,大多数植物物种的转录本序列目录是不完整的,缺少大量的转录本,而对于那些有RNA-seq数据的物种,过时的分析程序产生了大量的错误转录本。从开发AtRTD 2开始,我们就有了一个构建RTD的原型管道。其主要特征是多个质量控制过滤器,可去除错误组装的转录本、冗余转录本、嵌合转录本和转录本片段。这些多个迭代步骤目前是单独编码的,虽然可以使用管道,但生成RTD需要长达12个月的时间,并且需要生物信息学家的全职专业知识。我们将开发一个完全自动化的管道(RTDBox),可供具有基本生物信息学技能的科学家或在转录组学方面经验很少的生物信息学家使用。这样的管道还将被设计为允许RTD的增量改进,自动并入任何新的RNA-seq数据(Illumina,PacBio,Nanopore)。在管道中,我们将开发一个转录评估套件(TES),它将提供评估指标,以帮助生物学家识别和删除组装程序中的错误构建转录本,并了解所生成的RTD的质量和完整性。我们所有的经验和专业知识将汇集在一起,使一个用户友好的软件,植物科学家更准确地测量基因表达,从而改善整个地球仪的生物过程的探索。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The value of genotype-specific reference for transcriptome analyses in barley.
- DOI:10.26508/lsa.202101255
- 发表时间:2022-08
- 期刊:
- 影响因子:4.4
- 作者:Guo, Wenbin;Coulter, Max;Waugh, Robbie;Zhang, Runxuan
- 通讯作者:Zhang, Runxuan
3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists.
- DOI:10.1080/15476286.2020.1858253
- 发表时间:2021-11
- 期刊:
- 影响因子:4.1
- 作者:Guo W;Tzioutziou NA;Stephen G;Milne I;Calixto CP;Waugh R;Brown JWS;Zhang R
- 通讯作者:Zhang R
BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq
- DOI:10.1186/s12864-019-6243-7
- 发表时间:2019-12-11
- 期刊:
- 影响因子:4.4
- 作者:Rapazote-Flores, Paulo;Bayer, Micha;Simpson, Craig G.
- 通讯作者:Simpson, Craig G.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Runxuan Zhang其他文献
Representations of $\omega$-Lie Algebras and Tailed Derivations of Lie Algebras
- DOI:
- 发表时间:
2019-10 - 期刊:
- 影响因子:0
- 作者:
Runxuan Zhang - 通讯作者:
Runxuan Zhang
Cohomologies and Deformations of Generalized Left-symmetric Algebras
- DOI:
- 发表时间:
2013-02 - 期刊:
- 影响因子:0
- 作者:
Runxuan Zhang - 通讯作者:
Runxuan Zhang
The role of picornavirus infection in epileptogenesis
小核糖核酸病毒感染在癫痫发生中的作用
- DOI:
10.1186/s42494-021-00040-6 - 发表时间:
2021-03 - 期刊:
- 影响因子:0
- 作者:
Runxuan Zhang;Jie Mu;Jing Chi;Weijia Jiang;Xiaosa Chi - 通讯作者:
Xiaosa Chi
Evaluation for computational platforms of LC-MS based label-free quantitative proteomics: A global view
基于 LC-MS 的无标记定量蛋白质组学计算平台的评估:全局视角
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Runxuan Zhang;A. Barton;J. Brittenden;J. Huang;Daniel J. Crowther - 通讯作者:
Daniel J. Crowther
Simple $$\omega $$ -Lie Algebras and 4-Dimensional $$\omega $$ -Lie Algebras Over $${\mathbb {C}}$$
- DOI:
10.1007/s40840-015-0120-6 - 发表时间:
2015-04-29 - 期刊:
- 影响因子:1.200
- 作者:
Yin Chen;Runxuan Zhang - 通讯作者:
Runxuan Zhang
Runxuan Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
FAST连续观测数据处理的pipeline开发
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
云台40米射电望远镜脉冲星后端实时pipeline关键技术研究
- 批准号:
- 批准年份:2020
- 资助金额:37 万元
- 项目类别:地区科学基金项目
FAST高性能Pipeline关键技术研究
- 批准号:U1731125
- 批准年份:2017
- 资助金额:46.0 万元
- 项目类别:联合基金项目
相似海外基金
Construction of Pipeline for Integrated Metabolome Analyses of Kidney Disease
肾脏疾病综合代谢组分析流程的构建
- 批准号:
22K08317 - 财政年份:2022
- 资助金额:
$ 40.38万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
shRNAmir and CRISPR sgRNA Library Construction Core
shRNAmir 和 CRISPR sgRNA 文库构建核心
- 批准号:
10024585 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
shRNAmir and CRISPR sgRNA Library Construction Core
shRNAmir 和 CRISPR sgRNA 文库构建核心
- 批准号:
10224890 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
Sustainable automated pipeline construction development for MASIP
MASIP 可持续自动化管道建设开发
- 批准号:
77597 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
Collaborative R&D
shRNAmir and CRISPR sgRNA Library Construction Core
shRNAmir 和 CRISPR sgRNA 文库构建核心
- 批准号:
10488582 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
Development of a method for highly accurate genome construction from single sperm cells using single-cell analysis.
开发一种使用单细胞分析从单个精子细胞构建高精度基因组的方法。
- 批准号:
20K06607 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
shRNAmir and CRISPR sgRNA Library Construction Core
shRNAmir 和 CRISPR sgRNA 文库构建核心
- 批准号:
10683260 - 财政年份:2020
- 资助金额:
$ 40.38万 - 项目类别:
The construction of large-scale microsatellite database and bioinformatics pipeline system
大规模微卫星数据库及生物信息管道系统建设
- 批准号:
19K16113 - 财政年份:2019
- 资助金额:
$ 40.38万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
University of Washington Center of Excellence in Opioid Addiction Research
华盛顿大学阿片类药物成瘾研究卓越中心
- 批准号:
10611870 - 财政年份:2019
- 资助金额:
$ 40.38万 - 项目类别:
University of Washington Center of Excellence in Opioid Addiction Research
华盛顿大学阿片类药物成瘾研究卓越中心
- 批准号:
10152567 - 财政年份:2019
- 资助金额:
$ 40.38万 - 项目类别: