Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
基本信息
- 批准号:8943217
- 负责人:
- 金额:$ 30.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:5&apos Untranslated RegionsATP phosphohydrolaseAdenovirusesAlienAlternative SplicingAmino Acid MotifsAmino Acid SequenceApoptosisArchaeaArchitectureBacteriaBindingBinding ProteinsBiologicalC-terminalCapsid ProteinsClassificationClustered Regularly Interspaced Short Palindromic RepeatsCodeComplementCore ProteinDNADNA Binding DomainDNA Transposable ElementsDNA TransposonsDNA VirusesDataDatabasesDeoxyribonucleasesElementsEukaryotaEvolutionExonsFamilyGene Expression ProfileGenesGenomeGenomicsGoalsHelix-Turn-Helix MotifsHomology ModelingImmunityInvestigationJointsLigand BindingMethodsMinorNamesNeighborhoodsNucleic AcidsNucleotidesOpen Reading FramesOperonPatternPeptide HydrolasesPeptide Sequence DeterminationPlayPositioning AttributeProcessProtein CProtein Sequence AnalysisProteinsProteomeRNARNA polymerase sigma 54RibonucleasesRoleSequence AnalysisSideSignal TransductionStructureSystemTertiary Protein StructureTetrahymena thermophilaTimeTranscriptTranscription InitiationViral GenomeVirionVirusVirus DiseasesWinged HelixWorkadaptive immunityadenovirus penton proteinbasegene conservationgenome sequencingmarkov modelplasmid DNAprotein profilingprotein structurerepairedtranscription terminationtranscriptomicsviral DNAvirus host interaction
项目摘要
The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively applied.
Over the last year, we made further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. Specifically, we analyzed the evolution and functions of protein domains that are involved in virus-host interactions, from both the host and the virus sides. The CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and "effector" domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death.
Polintons (also known as Mavericks) and Tlr elements of Tetrahymena thermophila represent two families of large DNA transposons widespread in eukaryotes. We performed a detailed analysis of protein sequences encoded by these transposable elements and showed that both Polintons and Tlr elements encode two key virion proteins, the major capsid protein with the double jelly-roll fold and the minor capsid protein, known as the penton, with the single jelly-roll topology. This observation along with the previously noted conservation of the genes for viral genome packaging ATPase and adenovirus-like protease strongly suggests that Polintons and Tlr elements combine features of bona fide viruses and transposons. We proposed the name 'Polintoviruses' to denote these putative viruses that could have played a central role in the evolution of several groups of DNA viruses of eukaryotes.
These ongoing studies reveal new aspects on the remarkably diverse repertoire of protein domains involved in virus-host interactions.
As part of our ongoing investigation of the evolution of protein domain architectures, we analyzed the contributions of alternative splicing (AS),and alternative transcription initiation (ATI) and alternative transcription termination (ATT) to the evolution of mammalian proteins. Together, AS, ATI and ATT create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.
These studies enhance the existing understanding of the evolutionary plasticity of protein domain architecture.
基因组序列和蛋白质结构在过去十年中的快速积累已经被序列数据库搜索方法的重大进展所证实。NCBI开发的强大的位置特异性迭代BLAST(PSI-BLAST)方法构成了我们蛋白质基序分析工作的基础。此外,隐马尔可夫模型(HMM),在HHSearch方法中实现的蛋白质轮廓对轮廓比较,蛋白质结构比较方法,蛋白质结构的同源性建模和基因组背景分析被广泛应用。
在过去的一年里,我们在几类蛋白质和结构域的分类,进化和功能的研究方面取得了进一步的进展。具体而言,我们分析了参与病毒-宿主相互作用的蛋白质结构域的进化和功能,从宿主和病毒两方面。细菌和古细菌的CRISPR-Cas适应性免疫系统将病毒或质粒DNA片段作为间隔序列插入CRISPR重复基因座中。包含这些间隔区的加工转录物引导同源外源DNA或RNA的切割。除了公认的cas基因之外,大多数CRISPR-Cas基因座还包括不直接涉及间隔区获取、CRISPR转录物加工或干扰的基因。在这里,我们全面分析了这些基因中最广泛的一组基因的序列,结构和基因组邻域,这些基因编码含有预测的具有罗斯曼样折叠的核苷酸结合结构域的蛋白质,我们将其称为CARF(CRISPR相关罗斯曼折叠)。已经确定了几种CARF蛋白质的结构,但缺乏这些蛋白质的功能表征。CARF结构域最常与C-末端有翼螺旋-转角-螺旋DNA结合结构域和“效应”结构域组合,其中大多数被预测具有DNA酶或RNA酶活性。发散CARF结构域也发现于RtcR蛋白中,其是rtc RNA修复操纵子的σ-54依赖性调节剂。CARF基因经常与编码含有具有Sm样SH 3桶折叠的WYL结构域的蛋白质的那些基因共同存在,所述WYL结构域也被预测为结合配体。预测CRISPR-Cas和可能的其他防御系统由含有WYL和CARF结构域的多个配体结合蛋白转录调控,所述WYL和CARF结构域感测在病毒感染期间产生的修饰的核苷酸和核苷酸衍生物。我们假设CARF结构域还将信号从结合的配体传递到融合的效应子结构域,其攻击外来或自身核酸,分别导致补充CRISPR-Cas作用的免疫或休眠/程序性细胞死亡。
嗜热四膜虫的Polintons(也称为Mavericks)和Tlr元件代表了在真核生物中广泛存在的两个大DNA转座子家族。我们对这些转座因子编码的蛋白质序列进行了详细的分析,结果表明,Polintons和Tlr因子编码两种关键的病毒体蛋白,即具有双卷曲折叠的主要衣壳蛋白和具有单卷曲拓扑结构的次要衣壳蛋白(称为五邻体)。这一观察结果沿着先前注意到的病毒基因组包装ATP酶和腺病毒样蛋白酶基因的保守性,强烈表明Polintons和Tlr元件结合了真正的病毒和转座子的联合收割机特征。我们提出了“Polintoviruses”这个名字来表示这些假定的病毒,它们可能在真核生物的几组DNA病毒的进化中发挥了核心作用。
这些正在进行的研究揭示了病毒-宿主相互作用所涉及的蛋白质结构域的显著多样性的新方面。
作为我们正在进行的蛋白质结构域的进化研究的一部分,我们分析了选择性剪接(AS),选择性转录起始(ATI)和选择性转录终止(ATT)的哺乳动物蛋白质的进化的贡献。AS、ATI和ATT共同创造了转录组的非凡复杂性,并对哺乳动物蛋白质组的结构和功能多样性做出了关键贡献。对哺乳动物基因组和转录组数据的分析表明,与传统观点相反,ATI和ATT对转录组和蛋白质组多样性的联合贡献在数量上大于AS的贡献。虽然基因座中编码蛋白质的组成型和替代型核苷酸的平均数几乎相同,但它们沿转录本的分布沿着是高度不均匀的。平均而言,由ATI和ATT产生的可变5'和3'转录物末端中的编码外显子含有大约四倍于仅通过AS多样化的核心蛋白编码区的替代核苷酸。短的上游外显子,包括可选的5 '-非翻译区和N-末端的蛋白质进化下强核苷酸水平的选择,而在3'-末端外显子编码蛋白质C-末端,蛋白质水平的选择是显着更强。受ATI和ATT影响的基因组在生物学作用、表达和选择模式方面表现出重大差异。
这些研究增强了对蛋白质结构域结构的进化可塑性的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eugene V Koonin其他文献
The common ancestry of life
- DOI:
10.1186/1745-6150-5-64 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:4.900
- 作者:
Eugene V Koonin;Yuri I Wolf - 通讯作者:
Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
- DOI:
10.1038/ncomms13570 - 发表时间:
2016-11-18 - 期刊:
- 影响因子:15.700
- 作者:
Erez Persi;Yuri I. Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
- DOI:
10.1186/1745-6150-3-13 - 发表时间:
2008-04-01 - 期刊:
- 影响因子:4.900
- 作者:
Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
- DOI:
10.1186/1471-2164-3-8 - 发表时间:
2002-03-21 - 期刊:
- 影响因子:3.700
- 作者:
Lakshminarayan M Iyer;Eugene V Koonin;L Aravind - 通讯作者:
L Aravind
Eugene V Koonin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eugene V Koonin', 18)}}的其他基金
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6681337 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6988455 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
7969213 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
9160910 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7735068 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7594460 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
9555730 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
6988458 - 财政年份:
- 资助金额:
$ 30.99万 - 项目类别:
相似海外基金
Impact of alternative polyadenylation of 3'-untranslated regions in the PI3K/AKT cascade on microRNA
PI3K/AKT 级联中 3-非翻译区的替代多聚腺苷酸化对 microRNA 的影响
- 批准号:
573541-2022 - 财政年份:2022
- 资助金额:
$ 30.99万 - 项目类别:
University Undergraduate Student Research Awards
How do untranslated regions of cannabinoid receptor type 1 mRNA determine receptor subcellular localisation and function?
1 型大麻素受体 mRNA 的非翻译区如何决定受体亚细胞定位和功能?
- 批准号:
2744317 - 财政年份:2022
- 资助金额:
$ 30.99万 - 项目类别:
Studentship
MICA:Synthetic untranslated regions for direct delivery of therapeutic mRNAs
MICA:用于直接递送治疗性 mRNA 的合成非翻译区
- 批准号:
MR/V010948/1 - 财政年份:2021
- 资助金额:
$ 30.99万 - 项目类别:
Research Grant
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10019570 - 财政年份:2019
- 资助金额:
$ 30.99万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10223370 - 财政年份:2019
- 资助金额:
$ 30.99万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10455108 - 财政年份:2019
- 资助金额:
$ 30.99万 - 项目类别:
Synergistic microRNA-binding sites, and 3' untranslated regions: a dialogue of silence
协同的 microRNA 结合位点和 3 非翻译区:沉默的对话
- 批准号:
255762 - 财政年份:2012
- 资助金额:
$ 30.99万 - 项目类别:
Operating Grants
Analysis of long untranslated regions in Nipah virus genome
尼帕病毒基因组长非翻译区分析
- 批准号:
20790351 - 财政年份:2008
- 资助金额:
$ 30.99万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Search for mRNA elements involved in the compatibility between 5' untranslated regions and coding regions in chloroplast translation
寻找参与叶绿体翻译中 5 非翻译区和编码区之间兼容性的 mRNA 元件
- 批准号:
19370021 - 财政年份:2007
- 资助金额:
$ 30.99万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Post-transcriptional Regulation of PPAR-g Expression by 5'-Untranslated Regions
5-非翻译区对 PPAR-g 表达的转录后调控
- 批准号:
7131841 - 财政年份:2006
- 资助金额:
$ 30.99万 - 项目类别:














{{item.name}}会员




