Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
基本信息
- 批准号:9555730
- 负责人:
- 金额:$ 31.91万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:Actinomyces InfectionsAmino Acid MotifsAmino Acid SequenceAnimalsAntitumor ResponseApoptosisArchaeal GenomeArchitectureBacteriaBacterial GenomeBiologicalCapsid ProteinsCensusesClassificationClustered Regularly Interspaced Short Palindromic RepeatsCollectionComplexCustomDNADeath DomainDevelopmentDiseaseDissectionEukaryotaEvolutionFamilyFamily memberGenerationsGenesGenomeGenome engineeringGenomicsGoalsHomology ModelingHumanIndividualInvestigationLeadLibrariesLifeMethodologyMethodsMobile Genetic ElementsNomenclatureOrganismPatternPeriodicityPhenotypePlanet EarthPositioning AttributeProcessProkaryotic CellsPropertyProtein AnalysisProtein FamilyProtein Structure InitiativeProteinsRNA BindingRecruitment ActivityRegulationResearchRouteSAM DomainSignal TransductionStructureSystemTertiary Protein StructureVariantViralViral GenomeVirionVirusWorkadaptive immunitydatabase structuredesignexhaustionexperiencegenetic elementgenome editingmarkov modelmicrobialmolecular sequence databasenovelnucleoside triphosphatasepolymerizationprotein profilingprotein structuresample fixationtooltrait
项目摘要
The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI forms the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively and increasingly applied. Furthermore, custom libraries of protein domain profiles as well as computational pipelines for novel domain identification have been developed and applied.
The research performed over the last year, has led to further progress in the study of the classification, evolution, and functions of several classes of proteins and domains. In particular, we have performed a comprehensive analysis of the relationships among viral capsid proteins. Viruses are the most abundant biological entities on earth and show remarkable diversity of genome sequences, replication and expression strategies, and virion structures. Evolutionary genomics of viruses revealed many unexpected connections but the general scenario(s) for the evolution of the virosphere remains a matter of intense debate among proponents of the cellular regression, escaped genes, and primordial virus world hypotheses. A comprehensive sequence and structure analysis of major virion proteins indicates that they evolved on about 20 independent occasions, and in some of these cases likely ancestors are identifiable among the proteins of cellular organisms. Virus genomes typically consist of distinct structural and replication modules that recombine frequently and can have different evolutionary trajectories. The results of this analysis suggest that, although the replication modules of at least some classes of viruses might descend from primordial selfish genetic elements, bona fide viruses evolved on multiple, independent occasions throughout the course of evolution by the recruitment of diverse host proteins that became major virion components.
In another project, we performed a detailed analysis and classification of the protein domains that comprise the Class 2 CRISPR-Cas systems, the microbial defense machinery that has been recently exploited for development of a new generation of genome editing tools. Class 2 CRISPR-Cas systems are characterized by effector modules that consist of a single multidomain protein, such as Cas9 or Cpf1. We designed a computational pipeline for the discovery of novel class 2 variants and used it to identify six new CRISPR-Cas subtypes. The diverse properties of these new systems provide potential for the development of versatile tools for genome editing and regulation. We performed a comprehensive census of class 2 types and subtypes in complete and draft bacterial and archaeal genomes, outlined evolutionary scenarios for the independent origin of different class 2 CRISPR-Cas systems from mobile genetic elements, and proposed an amended classification and nomenclature of CRISPR-Cas.
In a separate development, we performed an exhaustive computational dissection of the domain architecture of the SAMD9 family proteins that are involved in antivirus and antitumor response in humans. We show that the SAMD9 protein family is represented in most animals and also, unexpectedly, in bacteria, in particular actinomycetes. From the N to C terminus, the core SAMD9 family architecture includes DNA/RNA-binding AlbA domain, a variant Sir2-like domain, a STAND-like P-loop NTPase, an array of TPR repeats and an OB-fold domain with predicted RNA-binding properties. Vertebrate SAMD9 family proteins contain the eponymous SAM domain capable of polymerization, whereas some family members from other animals instead contain homotypic adaptor domains of the DEATH superfamily, known as dedicated components of apoptosis networks. Such complex domain architecture is reminiscent of the STAND superfamily NTPases that are involved in various signaling processes, including programmed cell death, in both eukaryotes and prokaryotes. These findings suggest that SAMD9 is a hub of a novel, evolutionarily conserved defense network that remains to be characterized.
In a more theoretically oriented project, we performed a genomic census and evolutionary analysis of repeats arrays in diverse protein families. Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We showed that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Taken together, these studies expand the known repertoire of protein domains with defined functions and lead to the discovery of novel biologically important functional systems in diverse organisms some of which are expected to have practical implications, e.g. in genome engineering. The findings also contribute to the current understanding of the routes of protein evolution.
在过去十年中,基因组序列和蛋白质结构的快速积累与序列数据库搜索方法的重大进步并行。 NCBI 开发的强大的位置特异性迭代 BLAST (PSI-BLAST) 方法构成了我们蛋白质基序分析工作的基础。此外,隐马尔可夫模型(HMM)、HHSearch方法中实现的蛋白质图谱比较、蛋白质结构比较方法、蛋白质结构同源性建模和基因组背景分析也被广泛且越来越多地应用。此外,还开发并应用了蛋白质结构域概况的定制库以及用于新结构域识别的计算管道。
去年进行的研究使几类蛋白质和结构域的分类、进化和功能研究取得了进一步进展。特别是,我们对病毒衣壳蛋白之间的关系进行了全面分析。病毒是地球上最丰富的生物实体,在基因组序列、复制和表达策略以及病毒体结构方面表现出显着的多样性。病毒的进化基因组学揭示了许多意想不到的联系,但病毒圈进化的一般情况仍然是细胞回归、逃逸基因和原始病毒世界假说的支持者之间激烈争论的问题。对主要病毒体蛋白的全面序列和结构分析表明,它们在大约 20 个独立的场合进化,并且在其中一些情况下,可以在细胞有机体的蛋白质中识别出可能的祖先。病毒基因组通常由不同的结构和复制模块组成,这些模块经常重组,并且可以具有不同的进化轨迹。该分析的结果表明,尽管至少某些类别的病毒的复制模块可能源自原始自私的遗传元件,但真正的病毒在整个进化过程中通过招募成为主要病毒颗粒成分的不同宿主蛋白在多个独立的场合进化。
在另一个项目中,我们对构成 2 类 CRISPR-Cas 系统的蛋白质结构域进行了详细分析和分类,该系统是一种微生物防御机制,最近已被用于开发新一代基因组编辑工具。 2 类 CRISPR-Cas 系统的特点是效应模块由单个多域蛋白组成,例如 Cas9 或 Cpf1。我们设计了一个用于发现新型 2 类变异的计算管道,并用它来识别六种新的 CRISPR-Cas 亚型。这些新系统的不同特性为开发基因组编辑和调控的多功能工具提供了潜力。我们对完整和草稿的细菌和古细菌基因组中的 2 类类型和亚型进行了全面普查,概述了不同 2 类 CRISPR-Cas 系统从移动遗传元件独立起源的进化情景,并提出了修改后的 CRISPR-Cas 分类和命名法。
在另一项开发中,我们对参与人类抗病毒和抗肿瘤反应的 SAMD9 家族蛋白的结构域结构进行了详尽的计算剖析。我们发现 SAMD9 蛋白家族存在于大多数动物中,而且出乎意料地存在于细菌中,特别是放线菌中。从 N 到 C 末端,核心 SAMD9 家族架构包括 DNA/RNA 结合 AlbA 结构域、Sir2 样结构域变体、STAND 样 P 环 NTPase、TPR 重复序列和具有预测 RNA 结合特性的 OB 折叠结构域。脊椎动物 SAMD9 家族蛋白含有能够聚合的同名 SAM 结构域,而来自其他动物的一些家族成员则含有 DEATH 超家族的同型接头结构域,称为细胞凋亡网络的专用组件。这种复杂的结构域结构让人想起 STAND 超家族 NTPase,它们参与真核生物和原核生物中的各种信号传导过程,包括程序性细胞死亡。这些发现表明,SAMD9 是一个新颖的、进化上保守的防御网络的中心,该网络的特征仍有待确定。
在一个更具理论导向的项目中,我们对不同蛋白质家族中的重复序列进行了基因组普查和进化分析。蛋白质重复被认为是蛋白质进化的热点,与新功能和新表型特征(包括疾病)的获得相关。然而,矛盾的是,在长期的进化过程中,重复序列往往被强烈保守。为了解决这个难题,有必要直接比较蛋白质内重复序列的旁系同源(水平)进化与它们通过物种形成的直系同源(垂直)进化。在这里,我们开发了一种严格的方法来识别具有显着序列相似性的高度周期性重复,可以估计其进化速率和选择(dN/dS),并系统地表征它们的进化。我们发现,与密切相关物种中的直向同源物的分化相比,重复的水平进化明显加速。这一观察结果在生命形式的多样性中是普遍存在的,并且暗示了一种双相进化机制,在这种机制中,新的拷贝在强烈松弛的纯化选择和正选择的综合作用下经历快速的功能分化,然后是每个个体重复的固定和保存。
总而言之,这些研究扩展了具有明确功能的蛋白质结构域的已知库,并导致在不同生物体中发现新的具有生物学重要功能的系统,其中一些预计具有实际意义,例如。在基因组工程中。这些发现也有助于目前对蛋白质进化途径的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eugene V Koonin其他文献
The common ancestry of life
- DOI:
10.1186/1745-6150-5-64 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:4.900
- 作者:
Eugene V Koonin;Yuri I Wolf - 通讯作者:
Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
- DOI:
10.1038/ncomms13570 - 发表时间:
2016-11-18 - 期刊:
- 影响因子:15.700
- 作者:
Erez Persi;Yuri I. Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
- DOI:
10.1186/1745-6150-3-13 - 发表时间:
2008-04-01 - 期刊:
- 影响因子:4.900
- 作者:
Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
- DOI:
10.1186/1471-2164-3-8 - 发表时间:
2002-03-21 - 期刊:
- 影响因子:3.700
- 作者:
Lakshminarayan M Iyer;Eugene V Koonin;L Aravind - 通讯作者:
L Aravind
Eugene V Koonin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eugene V Koonin', 18)}}的其他基金
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6681337 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6988455 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
7969213 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
8943217 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
9160910 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7735068 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7594460 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
6988458 - 财政年份:
- 资助金额:
$ 31.91万 - 项目类别:
相似海外基金
Elucidating the biophysics of pre-fibrillar, toxic tau oligomers: from amino acid motifs to neuronal dysfunction
阐明前原纤维有毒 tau 寡聚体的生物物理学:从氨基酸基序到神经元功能障碍
- 批准号:
10461322 - 财政年份:2021
- 资助金额:
$ 31.91万 - 项目类别:
Elucidating the biophysics of pre-fibrillar, toxic tau oligomers: from amino acid motifs to neuronal dysfunction
阐明前原纤维有毒 tau 寡聚体的生物物理学:从氨基酸基序到神经元功能障碍
- 批准号:
10489810 - 财政年份:2021
- 资助金额:
$ 31.91万 - 项目类别:
Detection of amino acid motifs on the agretopes of antigens highly bound to MHC molecules
检测与 MHC 分子高度结合的抗原聚集位上的氨基酸基序
- 批准号:
03670243 - 财政年份:1991
- 资助金额:
$ 31.91万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)














{{item.name}}会员




