Generating a full-length reference transcriptome for human protein-coding genes

生成人类蛋白质编码基因的全长参考转录组

基本信息

  • 批准号:
    10687972
  • 负责人:
  • 金额:
    $ 66.35万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-22 至 2027-06-30
  • 项目状态:
    未结题

项目摘要

Abstract Elucidating the coding potential of the genome has benefited from accurate genome sequences and extensive transcriptome sequencing to allow detailed models for protein-coding sequences (CDSs) or open reading frames (ORFs). Although at least one reliable full-length transcript model has been assigned for every protein-coding gene, the majority of alternative isoforms remains uncharacterized due to i) vast differences of expression levels between isoforms expressed from common genes, and ii) the difficulty of obtaining full-length (FL) transcript sequences. Furthermore, there remains a large discrepancy between the total number of transcripts in annotation databases and the number for which there is an annotated FL transcript with experimental evidence. The spectrum of encoded transcripts comprises a vast but finite “isoform-space” with multiple dimensions: i) genes, ii) tissues and cell types, iii) development and time iv) disease, and v) response to stimuli. Just as expression levels vary across cells and tissues, so can the relative abundance of alternatively spliced transcripts. Full, functional understanding of the human genome will not be possible without empirical knowledge and complete annotation of the entire complement of encoded functional proteins. Historically, gene annotation was supported predominantly by ESTs and mRNAs from INSDC databases while automated approaches to annotation are being applied to whole genomes and transcriptomes. However, current automated annotation does not provide the same quality data as does manual annotation. Sensitivity and specificity are reduced, less functional annotation is captured, and all automated methods lack the capacity of a manual annotator to introduce additional orthogonal data types and interpretation of the scientific literature, but manual annotation is highly labor-intensive. GENCODE release v36 represents the interpretation of nearly 10 million EST, cDNA and protein homologies. Given the anticipated volumes of data, with single experiments producing more data than the entire INSDC catalogue, current methods of manual annotation do not scale. The emergence of long transcriptomic sequencing methods provides for the replacement of historical data types to the benefit of gene and transcript annotation. However, the massively greater data volumes already being deposited in public data archives exceed manual curation capability, demanding implementation of automated solutions without compromising annotation quality. Furthermore, as untargeted sequencing approaches are very inefficient in their discovery of less abundant transcripts, the majority of sequence data generated gives us very little insight into discoverable transcript diversity. To overcome these challenges, our two respective groups have joined forces to increase the catalog of fully experimentally verified full length human protein-coding transcripts. This proposal focuses on the integration of experimental approaches that will provide a comprehensive enumeration of human protein-coding transcripts, a “Reference Human Transcriptome” with the development of an automated annotation pipeline to allow the integration of this resource into GENCODE gene annotation.
摘要 阐明基因组的编码潜力得益于精确的基因组序列和广泛的 转录组测序允许蛋白质编码序列(CDS)或开放阅读框架的详细模型 (ORF)。虽然至少有一个可靠的全长转录模型已被分配给每一个蛋白质编码, 由于i)表达水平的巨大差异, 从共同基因表达的同种型之间的差异,以及ii)获得全长(FL)转录物的困难 序列的此外,2004年和2005年的记录誊本总数之间仍有很大差异, 注释数据库以及具有实验证据的注释FL转录本的数量。 编码转录物的谱包括具有多个维度的巨大但有限的“异构体空间”: 基因,ii)组织和细胞类型,iii)发育和时间,iv)疾病,以及v)对刺激的反应。正如 表达水平在细胞和组织中不同,选择性剪接转录物的相对丰度也不同。 如果没有经验知识, 完整注释编码的功能蛋白质的整个互补序列。 历史上,基因注释主要由来自INSDC数据库的EST和mRNA支持, 自动化注释方法正被应用于全基因组和转录组。但目前的 自动注释不能提供与手动注释相同质量的数据。灵敏度和 特异性降低,捕获较少的功能注释,并且所有自动化方法都缺乏自动化的能力。 手动注释器,用于介绍其他正交数据类型和科学文献的解释,但 手动注释是高度劳动密集型的。GENCODE版本v36代表了近10种 100万个EST、cDNA和蛋白质同源性。考虑到预期的数据量, 产生的数据比整个INSDC目录还多,目前的手工注释方法没有按比例缩放。的 长转录组测序方法的出现提供了历史数据类型的替换, 基因和转录本注释的好处。然而,大量的数据量已经被 存放在公共数据档案超过人工管理能力,要求实施自动化 解决方案而不影响注释质量。此外,由于非靶向测序方法非常困难, 由于他们发现不太丰富的转录本的效率很低,因此产生的大多数序列数据给我们提供了非常 对可复制的转录本多样性了解甚少。为了克服这些挑战,我们两个小组分别 联合力量,以增加完全实验验证的全长人类蛋白质编码转录本的目录。 这项建议的重点是整合实验方法, 人类蛋白质编码转录本的计数,随着 一个自动化注释管道,允许将该资源整合到GENCODE基因注释中。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David E. Hill其他文献

Design and synthesis of a protein. beta. -turn mimetic
蛋白质的设计和合成。
  • DOI:
  • 发表时间:
    1990
  • 期刊:
  • 影响因子:
    0
  • 作者:
    G. Olson;M. Voss;David E. Hill;M. Kahn;V. Madison;C. Cook
  • 通讯作者:
    C. Cook
Ureteroscopy in Children
  • DOI:
    10.1016/s0022-5347(17)39496-x
  • 发表时间:
    1990-08-01
  • 期刊:
  • 影响因子:
  • 作者:
    David E. Hill;Joseph W. Segura;David E. Patterson;Stephen A. Kramer
  • 通讯作者:
    Stephen A. Kramer
Evaluating the accuracy of density functional theory for calculating 1H and 13C NMR chemical shifts in drug molecules
评估密度泛函理论计算药物分子 1H 和 13C NMR 化学位移的准确性
Evaluation of SynPhase Lanterns for capturing Ac-225 from bulk thorium
  • DOI:
    10.1007/s10967-018-5997-8
  • 发表时间:
    2018-07-06
  • 期刊:
  • 影响因子:
    1.600
  • 作者:
    Jonathan Fitzsimmons;Bryna Torre;Bryan Foley;Roy Copping;David E. Hill;Saed Mirzadeh;Cathy S. Cutler;Leonard Mausner;Dmitri Medvedev
  • 通讯作者:
    Dmitri Medvedev
Fully 3D Monte Carlo image reconstruction in SPECT using functional regions
使用功能区域在 SPECT 中进行全 3D 蒙特卡罗图像重建

David E. Hill的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David E. Hill', 18)}}的其他基金

Generating a full-length reference transcriptome for human protein-coding genes
生成人类蛋白质编码基因的全长参考转录组
  • 批准号:
    10331602
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
The 6th ORFeome Meeting: ORFeomes and Systems
第六届 ORFeome 会议:ORFeomes 和系统
  • 批准号:
    7225045
  • 财政年份:
    2006
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the first half of the REFERENCE human binary protein interactome
绘制参考人类二元蛋白质相互作用组的前半部分
  • 批准号:
    8518435
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the Human Binary Interactome Network
绘制人类二元相互作用组网络
  • 批准号:
    7688648
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the Human Binary Interactome Network
绘制人类二元相互作用组网络
  • 批准号:
    7530040
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the first half of the REFERENCE human binary protein interactome
绘制参考人类二元蛋白质相互作用组的前半部分
  • 批准号:
    8245460
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the first half of the REFERENCE human binary protein interactome
绘制参考人类二元蛋白质相互作用组的前半部分
  • 批准号:
    8666559
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
Mapping the Human Binary Interactome Network
绘制人类二元相互作用组网络
  • 批准号:
    7905208
  • 财政年份:
    1998
  • 资助金额:
    $ 66.35万
  • 项目类别:
DETECTION OF ALTERED APC PROTEINS IN COLON CANCER CELLS
结肠癌细胞中 APC 蛋白改变的检测
  • 批准号:
    3493423
  • 财政年份:
    1993
  • 资助金额:
    $ 66.35万
  • 项目类别:
Resource Project
资源项目
  • 批准号:
    8998371
  • 财政年份:
  • 资助金额:
    $ 66.35万
  • 项目类别:

相似海外基金

Mechanisms that underlie the life/death decisions in a cell that activated apoptotic caspases
细胞中激活凋亡半胱天冬酶的生/死决策的机制
  • 批准号:
    10607815
  • 财政年份:
    2023
  • 资助金额:
    $ 66.35万
  • 项目类别:
Nuclear and chromatin aberrations during non-apoptotic cell death in C. elegans and mammals
线虫和哺乳动物非凋亡细胞死亡过程中的核和染色质畸变
  • 批准号:
    10723868
  • 财政年份:
    2023
  • 资助金额:
    $ 66.35万
  • 项目类别:
Non-apoptotic functions of caspase-3 in neural development
Caspase-3在神经发育中的非凋亡功能
  • 批准号:
    10862033
  • 财政年份:
    2023
  • 资助金额:
    $ 66.35万
  • 项目类别:
Apoptotic Donor Leukocytes to Promote Kidney Transplant Tolerance
凋亡供体白细胞促进肾移植耐受
  • 批准号:
    10622209
  • 财政年份:
    2023
  • 资助金额:
    $ 66.35万
  • 项目类别:
Design of apoptotic cell mimetic anti-inflammatory polymers for the treatment of cytokine storm
用于治疗细胞因子风暴的模拟凋亡细胞抗炎聚合物的设计
  • 批准号:
    22H03963
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Identifying the mechanisms behind non-apoptotic functions of mitochondrial matrix-localized MCL-1
确定线粒体基质定位的 MCL-1 非凋亡功能背后的机制
  • 批准号:
    10537709
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
Environmental Carcinogens Induce Minority MOMP to Initiate Carcinogenesis in Lung Cancer and Mesothelioma whileMaintaining Apoptotic Resistance via Mcl-1
环境致癌物诱导少数 MOMP 引发肺癌和间皮瘤的癌变,同时通过 Mcl-1 维持细胞凋亡抵抗
  • 批准号:
    10356565
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
Targeting apoptotic cells to enhance radiotherapy
靶向凋亡细胞以增强放射治疗
  • 批准号:
    10708827
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
Activation of non-apoptotic cell death by the DNA damage response
DNA 损伤反应激活非凋亡细胞死亡
  • 批准号:
    10388929
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
Role of natural immunity to self apoptotic exosomes in maintaining immune homeostasis
对自凋亡外泌体的自然免疫在维持免疫稳态中的作用
  • 批准号:
    RGPIN-2021-03004
  • 财政年份:
    2022
  • 资助金额:
    $ 66.35万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了