Finding Protein Sequence Motifs--methods And Applications

寻找蛋白质序列基序——方法和应用

基本信息

  • 批准号:
    8943217
  • 负责人:
  • 金额:
    $ 30.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively applied. Over the last year, we made further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we analyzed the evolution and functions of protein domains that are involved in virus-host interactions, from both the host and the virus sides. The CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and "effector" domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death. Polintons (also known as Mavericks) and Tlr elements of Tetrahymena thermophila represent two families of large DNA transposons widespread in eukaryotes. We performed a detailed analysis of protein sequences encoded by these transposable elements and showed that both Polintons and Tlr elements encode two key virion proteins, the major capsid protein with the double jelly-roll fold and the minor capsid protein, known as the penton, with the single jelly-roll topology. This observation along with the previously noted conservation of the genes for viral genome packaging ATPase and adenovirus-like protease strongly suggests that Polintons and Tlr elements combine features of bona fide viruses and transposons. We proposed the name 'Polintoviruses' to denote these putative viruses that could have played a central role in the evolution of several groups of DNA viruses of eukaryotes. These ongoing studies reveal new aspects on the remarkably diverse repertoire of protein domains involved in virus-host interactions. As part of our ongoing investigation of the evolution of protein domain architectures, we analyzed the contributions of alternative splicing (AS),and alternative transcription initiation (ATI) and alternative transcription termination (ATT) to the evolution of mammalian proteins. Together, AS, ATI and ATT create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. These studies enhance the existing understanding of the evolutionary plasticity of protein domain architecture.
在过去十年中,基因组序列和蛋白质结构的快速积累与序列数据库搜索方法的重大进步并行。 NCBI 开发的强大的位置特异性迭代 BLAST (PSI-BLAST) 方法构成了我们蛋白质基序分析工作的基础。此外,隐马尔可夫模型(HMM)、HHSearch方法中实现的蛋白质图谱比较、蛋白质结构比较方法、蛋白质结构的同源性建模和基因组背景分析也被广泛应用。 去年,我们在几类蛋白质和结构域的分类、进化和功能研究方面取得了进一步进展。具体来说,我们从宿主和病毒两方面分析了参​​与病毒-宿主相互作用的蛋白质结构域的进化和功能。细菌和古细菌的 CRISPR-Cas 适应性免疫系统将病毒或质粒 DNA 片段作为间隔序列插入 CRISPR 重复基因座中。包含这些间隔区的经过处理的转录物指导同源外源 DNA 或 RNA 的切割。大多数 CRISPR-Cas 位点除了已识别的 cas 基因外,还包括不直接参与间隔区获取、CRISPR 转录本加工或干扰的基因。在这里,我们全面分析了最广泛分布的基因组之一的序列、结构和基因组邻域,这些基因编码含有预测的具有罗斯曼样折叠的核苷酸结合域的蛋白质,我们将其表示为 CARF(CRISPR 相关罗斯曼折叠)。几种 CARF 蛋白质结构已确定,但缺乏这些蛋白质的功能表征。 CARF 结构域最常与 C 端翼状螺旋-转角-螺旋 DNA 结合结构域和“效应器”结构域结合,其中大多数预计具有 DNase 或 RNase 活性。在 RtcR 蛋白中也发现了不同的 CARF 结构域,RtcR 蛋白是 rtc RNA 修复操纵子的 sigma-54 依赖性调节因子。 CARF 基因经常与那些编码含有 WYL 结构域和 Sm 样 SH3 β-桶折叠的蛋白质同时出现,预计该蛋白质也能结合配体。预计 CRISPR-Cas 和可能的其他防御系统将受到包含 WYL 和 CARF 结构域的多种配体结合蛋白的转录调节,这些配体结合蛋白可感知病毒感染过程中产生的修饰核苷酸和核苷酸衍生物。我们假设 CARF 结构域还将信号从结合配体传递到融合效应结构域,攻击外来或自身核酸,分别导致免疫补充 CRISPR-Cas 作用或休眠/程序性细胞死亡。 嗜热四膜虫的 Polintons(也称为 Mavericks)和 Tlr 元件代表了真核生物中广泛存在的两个大型 DNA 转座子家族。我们对这些转座元件编码的蛋白质序列进行了详细分析,结果表明,Polintons 和 Tlr 元件都编码两种关键的病毒体蛋白,即具有双果冻卷折叠的主要衣壳蛋白和具有单果冻卷拓扑的次要衣壳蛋白(称为五邻体)。这一观察结果以及之前提到的病毒基因组包装 ATP 酶和腺病毒样蛋白酶基因的保守性强烈表明,Polintons 和 Tlr 元件结合了真正病毒和转座子的特征。我们提出了“Polintoviruses”这个名称来表示这些假定的病毒,它们可能在真核生物几组 DNA 病毒的进化中发挥了核心作用。 这些正在进行的研究揭示了参与病毒-宿主相互作用的蛋白质结构域的多样性的新方面。 作为我们正在进行的蛋白质结构域进化研究的一部分,我们分析了选择性剪接(AS)、选择性转录起始(ATI)和选择性转录终止(ATT)对哺乳动物蛋白质进化的贡献。 AS、ATI 和 ATT 共同创造了极其复杂的转录组,并对哺乳动物蛋白质组的结构和功能多样性做出了关键贡献。对哺乳动物基因组和转录组数据的分析表明,与传统观点相反,ATI和ATT对转录组和蛋白质组多样性的共同贡献在数量上大于AS的贡献。尽管基因位点中蛋白质编码组成型核苷酸和替代核苷酸的平均数量几乎相同,但它们沿转录本的分布高度不均匀。平均而言,ATI 和 ATT 产生的可变 5' 和 3' 转录物末端的编码外显子所含的替代核苷酸比仅通过 AS 多样化的核心蛋白编码区多大约四倍。包含蛋白质替代 5'-非翻译区和 N 末端的短上游外显子在强核苷酸水平选择下进化,而在编码蛋白质 C 末端的 3'-末端外显子中,蛋白质水平选择明显更强。受 ATI 和 ATT 影响的基因组在生物学作用、表达和选择模式方面表现出重大差异。 这些研究增强了对蛋白质结构域结构进化可塑性的现有理解。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eugene V Koonin其他文献

The common ancestry of life
  • DOI:
    10.1186/1745-6150-5-64
  • 发表时间:
    2010-01-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Eugene V Koonin;Yuri I Wolf
  • 通讯作者:
    Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
  • DOI:
    10.1038/ncomms13570
  • 发表时间:
    2016-11-18
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Erez Persi;Yuri I. Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
  • DOI:
    10.1186/1745-6150-3-13
  • 发表时间:
    2008-04-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
  • DOI:
    10.1186/1471-2164-3-8
  • 发表时间:
    2002-03-21
  • 期刊:
  • 影响因子:
    3.700
  • 作者:
    Lakshminarayan M Iyer;Eugene V Koonin;L Aravind
  • 通讯作者:
    L Aravind

Eugene V Koonin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eugene V Koonin', 18)}}的其他基金

Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6681337
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6988455
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7969213
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9160910
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7735068
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7594460
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    9555730
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6111075
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    6988458
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6432755
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:

相似海外基金

Impact of alternative polyadenylation of 3'-untranslated regions in the PI3K/AKT cascade on microRNA
PI3K/AKT 级联中 3-非翻译区的替代多聚腺苷酸化对 microRNA 的影响
  • 批准号:
    573541-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 30.99万
  • 项目类别:
    University Undergraduate Student Research Awards
How do untranslated regions of cannabinoid receptor type 1 mRNA determine receptor subcellular localisation and function?
1 型大麻素受体 mRNA 的非翻译区如何决定受体亚细胞定位和功能?
  • 批准号:
    2744317
  • 财政年份:
    2022
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Studentship
MICA:Synthetic untranslated regions for direct delivery of therapeutic mRNAs
MICA:用于直接递送治疗性 mRNA 的合成非翻译区
  • 批准号:
    MR/V010948/1
  • 财政年份:
    2021
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Research Grant
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10019570
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10223370
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10455108
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Synergistic microRNA-binding sites, and 3' untranslated regions: a dialogue of silence
协同的 microRNA 结合位点和 3 非翻译区:沉默的对话
  • 批准号:
    255762
  • 财政年份:
    2012
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Operating Grants
Analysis of long untranslated regions in Nipah virus genome
尼帕病毒基因组长非翻译区分析
  • 批准号:
    20790351
  • 财政年份:
    2008
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Search for mRNA elements involved in the compatibility between 5' untranslated regions and coding regions in chloroplast translation
寻找参与叶绿体翻译中 5 非翻译区和编码区之间兼容性的 mRNA 元件
  • 批准号:
    19370021
  • 财政年份:
    2007
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Post-transcriptional Regulation of PPAR-g Expression by 5'-Untranslated Regions
5-非翻译区对 PPAR-g 表达的转录后调控
  • 批准号:
    7131841
  • 财政年份:
    2006
  • 资助金额:
    $ 30.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了