Finding Protein Sequence Motifs--methods And Applications

寻找蛋白质序列基序——方法和应用

基本信息

  • 批准号:
    8943217
  • 负责人:
  • 金额:
    $ 30.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively applied. Over the last year, we made further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we analyzed the evolution and functions of protein domains that are involved in virus-host interactions, from both the host and the virus sides. The CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and "effector" domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death. Polintons (also known as Mavericks) and Tlr elements of Tetrahymena thermophila represent two families of large DNA transposons widespread in eukaryotes. We performed a detailed analysis of protein sequences encoded by these transposable elements and showed that both Polintons and Tlr elements encode two key virion proteins, the major capsid protein with the double jelly-roll fold and the minor capsid protein, known as the penton, with the single jelly-roll topology. This observation along with the previously noted conservation of the genes for viral genome packaging ATPase and adenovirus-like protease strongly suggests that Polintons and Tlr elements combine features of bona fide viruses and transposons. We proposed the name 'Polintoviruses' to denote these putative viruses that could have played a central role in the evolution of several groups of DNA viruses of eukaryotes. These ongoing studies reveal new aspects on the remarkably diverse repertoire of protein domains involved in virus-host interactions. As part of our ongoing investigation of the evolution of protein domain architectures, we analyzed the contributions of alternative splicing (AS),and alternative transcription initiation (ATI) and alternative transcription termination (ATT) to the evolution of mammalian proteins. Together, AS, ATI and ATT create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. These studies enhance the existing understanding of the evolutionary plasticity of protein domain architecture.
在过去的十年中,基因组序列和蛋白质结构的快速积累伴随着序列数据库搜索方法的重大进步。NCBI开发的强大的位置特定迭代BLAST(PSI-BLAST)方法形成了我们在蛋白质基序分析方面的工作基础。此外,隐马尔可夫模型(HMM)、HHSearch方法中实现的蛋白质谱对谱比较、蛋白质结构比较方法、蛋白质结构同源建模和基因组上下文分析等方法也得到了广泛的应用。 在过去的一年里,我们在几类蛋白质和结构域的分类、进化和功能研究方面取得了进一步的进展。具体地说,我们从宿主和病毒两个方面分析了参与病毒-宿主相互作用的蛋白质结构域的进化和功能。细菌和古细菌的CRISPR-Cas适应性免疫系统将病毒或质粒DNA片段作为间隔区插入CRISPR重复序列。包含这些间隔区的加工转录本引导同源外源DNA或RNA的切割。大多数CRISPR-CAS基因座,除了已知的CA基因外,还包括与间隔区获得、CRISPR转录本处理或干扰没有直接关系的基因。在这里,我们全面分析了这类基因中最广泛的一组基因的序列、结构和基因组邻域,这些基因编码的蛋白质含有一个预测的核苷酸结合结构域和一个类似Rossmann的折叠,我们称之为CARF(CRISPR相关的Rossmann折叠)。已经确定了几种CARF蛋白的结构,但缺乏对这些蛋白的功能表征。CARF结构域最常与C-末端有翼的螺旋-转角螺旋DNA结合域和“效应”域结合,其中大多数结构域被预测具有DNA酶或核糖核酸酶活性。在RtcR蛋白中也发现了发散的CARF结构域,RtcR蛋白是RTC RNA修复操纵子的依赖于sigma-54的调节因子。CARF基因经常与编码含有WYL结构域的蛋白质的那些基因共生,该结构域具有类似Sm的SH3-Barrel折叠,该折叠也被预测为结合配体。CRISPR-Cas和可能的其他防御系统被预测由含有WYL和CARF结构域的多个配体结合蛋白转录调控,这些蛋白感知病毒感染过程中产生的修饰核苷酸和核苷酸衍生物。我们假设,CARF结构域也将信号从结合的配体传递到融合的效应域,后者攻击异源核酸或自身核酸,分别导致免疫补充CRISPR-Cas作用或导致休眠/程序性细胞死亡。 嗜热四膜虫的Polintons(又称Mavericks)和TLR元件代表了在真核生物中广泛存在的两个大的DNA转座子家族。我们对这些转座元件编码的蛋白质序列进行了详细的分析,发现Polintons和TLR元件都编码两个关键的病毒蛋白,主要的衣壳蛋白具有双重胶卷折叠,次要的衣壳蛋白称为五角蛋白,具有单一的胶卷拓扑结构。这一观察结果以及先前提到的病毒基因组包装ATPase和腺病毒样蛋白酶基因的保守性强烈表明,Polintons和TLR元件结合了真正的病毒和转座子的特征。我们建议命名为PolintoVirus,以表示这些可能在真核生物的几组DNA病毒的进化中发挥核心作用的病毒。 这些正在进行的研究揭示了病毒-宿主相互作用中涉及的蛋白质结构域显著多样化的新方面。 作为我们正在进行的蛋白质结构域结构进化研究的一部分,我们分析了选择性剪接(AS)、选择性转录起始(ATI)和选择性转录终止(ATT)对哺乳动物蛋白质进化的贡献。AS,ATI和ATT共同创造了异常复杂的转录本,并对哺乳动物蛋白质组的结构和功能多样性做出了关键贡献。对哺乳动物基因组和转录组数据的分析表明,与传统观点相反,ATI和ATT对转录组和蛋白质组多样性的共同贡献在数量上大于AS的贡献。尽管编码蛋白质的组成核苷酸和替代核苷酸在基因座上的平均数量几乎相同,但它们在转录本上的分布是高度不均匀的。平均而言,由ATI和ATT创建的可变5‘和3’转录端的编码外显子包含的替代核苷酸大约是仅通过AS多样化的核心蛋白质编码区的四倍。包含蛋白质5‘端非翻译区和N末端的短上游外显子在强核苷酸水平的选择下进化,而在编码蛋白C末端的3’端外显子中,蛋白质水平的选择显著增强。受ATI和ATT影响的基因组在生物学作用、表达和选择模式上表现出重大差异。 这些研究加强了对蛋白质结构域结构进化可塑性的现有理解。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eugene V Koonin其他文献

The common ancestry of life
  • DOI:
    10.1186/1745-6150-5-64
  • 发表时间:
    2010-01-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Eugene V Koonin;Yuri I Wolf
  • 通讯作者:
    Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
  • DOI:
    10.1038/ncomms13570
  • 发表时间:
    2016-11-18
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Erez Persi;Yuri I. Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
  • DOI:
    10.1186/1745-6150-3-13
  • 发表时间:
    2008-04-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
  • DOI:
    10.1186/1471-2164-3-8
  • 发表时间:
    2002-03-21
  • 期刊:
  • 影响因子:
    3.700
  • 作者:
    Lakshminarayan M Iyer;Eugene V Koonin;L Aravind
  • 通讯作者:
    L Aravind

Eugene V Koonin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eugene V Koonin', 18)}}的其他基金

Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6681337
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6988455
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7969213
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9160910
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7735068
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7594460
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    9555730
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6111075
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    6988458
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6432755
  • 财政年份:
  • 资助金额:
    $ 30.99万
  • 项目类别:

相似海外基金

Impact of alternative polyadenylation of 3'-untranslated regions in the PI3K/AKT cascade on microRNA
PI3K/AKT 级联中 3-非翻译区的替代多聚腺苷酸化对 microRNA 的影响
  • 批准号:
    573541-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 30.99万
  • 项目类别:
    University Undergraduate Student Research Awards
How do untranslated regions of cannabinoid receptor type 1 mRNA determine receptor subcellular localisation and function?
1 型大麻素受体 mRNA 的非翻译区如何决定受体亚细胞定位和功能?
  • 批准号:
    2744317
  • 财政年份:
    2022
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Studentship
MICA:Synthetic untranslated regions for direct delivery of therapeutic mRNAs
MICA:用于直接递送治疗性 mRNA 的合成非翻译区
  • 批准号:
    MR/V010948/1
  • 财政年份:
    2021
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Research Grant
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10019570
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10223370
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
  • 批准号:
    10455108
  • 财政年份:
    2019
  • 资助金额:
    $ 30.99万
  • 项目类别:
Synergistic microRNA-binding sites, and 3' untranslated regions: a dialogue of silence
协同的 microRNA 结合位点和 3 非翻译区:沉默的对话
  • 批准号:
    255762
  • 财政年份:
    2012
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Operating Grants
Analysis of long untranslated regions in Nipah virus genome
尼帕病毒基因组长非翻译区分析
  • 批准号:
    20790351
  • 财政年份:
    2008
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Search for mRNA elements involved in the compatibility between 5' untranslated regions and coding regions in chloroplast translation
寻找参与叶绿体翻译中 5 非翻译区和编码区之间兼容性的 mRNA 元件
  • 批准号:
    19370021
  • 财政年份:
    2007
  • 资助金额:
    $ 30.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Post-transcriptional Regulation of PPAR-g Expression by 5'-Untranslated Regions
5-非翻译区对 PPAR-g 表达的转录后调控
  • 批准号:
    7131841
  • 财政年份:
    2006
  • 资助金额:
    $ 30.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了