Comparative Analysis Of Completely Sequenced Genomes

完全测序的基因组的比较分析

基本信息

  • 批准号:
    7316251
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

The rapidly growing database of completely sequenced genomes of bacteria, archaea and eukaryotes (over 200 genomes available by the end of 2004 and many more in progress) creates both new opportunities and new challenges for genome research. During the last year, we performed several studies that took advantage of the genomic information to establish fundamental principles of genome evolution and function. By comparing sequences of human, mouse and rat orthologous genes, we show that in 5'-untranslated regions (5'-UTRs) of mammalian cDNAs but not in 3'-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5'-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5'-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5'-UTRs. In another project, we assessed the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes. We also studied universal trends in the evolution of amino acid composition of proteins. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life (Bacteria, Archaea and Eukaryota), and used phylogenies to polarize amino acid substitutions. Cys, Met, His, Ser and Phe accrue in at least 14 taxa, whereas Pro, Ala, Glu and Gly are consistently lost. The same nine amino acids are currently accrued or lost in human proteins, as shown by analysis of non-synonymous single-nucleotide polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late. Thus, expansion of initially under-represented amino acids, which began over 3,400 million years ago, apparently continues to this day.
细菌、古生菌和真核生物基因组完全测序数据库的迅速增长(到2004年底可提供200多个基因组,还有更多的基因组正在开发中)为基因组研究创造了新的机遇和新的挑战。在过去的一年里,我们进行了几项研究,利用基因组信息建立基因组进化和功能的基本原则。通过比较人、小鼠和大鼠的正向同源基因的序列,我们发现在哺乳动物cDNA的5 '-非翻译区(5'-UTR)中,AUG比其他63 nt三联体中的任何一个都保守得多,而在3 '-UTR或编码序列中则没有。通过比较来自酵母属的不同种的正向同源基因,获得了定性相似的结果。结合哺乳动物和酵母5 '-UTR的总体AUG含量显著减少的观察结果,这些发现表明5'-UTR中的AUG三联体受到两个相反方向的纯化选择的压力:没有特定功能的uAUG往往是有害的,并在进化过程中被消除,而那些具有功能的uAUG是保守的。最可能的是,保守uAUG的主要作用是在起始阶段减弱翻译,这通常另外通过哺乳动物5 '-UTR中的选择性剪接来调节。在另一个项目中,我们评估了祖先旁系同源的程度,可以追溯到所有真核生物的最后一个共同祖先,并研究了祖先旁系同源的起源及其在真核细胞复杂性出现中的潜在作用。祖先基因库的简约重建显示,最后一个真核生物共同祖先(LECA)中的4137个正向同源基因组映射回假设的第一个真核生物共同祖先(FECA)中的2150个正向同源基因组[旁系商数(PQ)为1.92]。类似的重建显示原核生物的paralogy水平显着降低,古细菌为1.19,细菌为1.25。具有显著超过平均水平的旁系同源簇的真核蛋白的唯一功能类别包括分子伴侣和具有相关功能的蛋白。几乎所有这类基因在早期真核生物进化过程中都经历了多次复制。在结构方面,最突出的旁系同源物是具有重复结构域的超结构形成蛋白,如WD-40和TPR。除了在真核生物进化开始时通过复制进化的祖先旁系同源物之外,还检测到许多假旁系同源物,即显然由早期真核生物通过不同途径获得的同源基因,包括来自不同细菌的水平基因转移(HGT)。这项研究的结果表明,作为真核生物早期进化标志的基因旁系水平的大幅增加。我们还研究了蛋白质氨基酸组成进化的普遍趋势。我们比较了由来自15个分类群的密切相关的基因组的三联体编码的直链蛋白质组,这些分类群代表了所有三个生命领域(细菌、真核生物和真核生物),并使用同源基因来鉴定氨基酸取代。Cys、Met、His、Ser和Phe在至少14个分类群中累积,而Pro、Ala、Glu和Gly持续丢失。正如非同义单核苷酸多态性分析所显示的,相同的9种氨基酸目前在人类蛋白质中积累或丢失。所有频率下降的氨基酸都被认为是最早被纳入遗传密码的氨基酸之一;相反,所有频率增加的氨基酸,除了丝氨酸,可能是后来被招募的。因此,34亿年前开始的最初代表性不足的氨基酸的扩张显然一直持续到今天。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eugene V Koonin其他文献

The common ancestry of life
  • DOI:
    10.1186/1745-6150-5-64
  • 发表时间:
    2010-01-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Eugene V Koonin;Yuri I Wolf
  • 通讯作者:
    Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi
  • 通讯作者:
    Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
  • DOI:
    10.1038/ncomms13570
  • 发表时间:
    2016-11-18
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Erez Persi;Yuri I. Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
  • DOI:
    10.1186/1745-6150-3-13
  • 发表时间:
    2008-04-01
  • 期刊:
  • 影响因子:
    4.900
  • 作者:
    Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin
  • 通讯作者:
    Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
  • DOI:
    10.1186/1471-2164-3-8
  • 发表时间:
    2002-03-21
  • 期刊:
  • 影响因子:
    3.700
  • 作者:
    Lakshminarayan M Iyer;Eugene V Koonin;L Aravind
  • 通讯作者:
    L Aravind

Eugene V Koonin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eugene V Koonin', 18)}}的其他基金

Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6681337
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
  • 批准号:
    6988455
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7969213
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    8943217
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9160910
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7735068
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    7594460
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
  • 批准号:
    9555730
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6111075
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    6988458
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Intelligent Patent Analysis for Optimized Technology Stack Selection:Blockchain BusinessRegistry Case Demonstration
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国学者研究基金项目
基于Meta-analysis的新疆棉花灌水增产模型研究
  • 批准号:
    41601604
  • 批准年份:
    2016
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目
大规模微阵列数据组的meta-analysis方法研究
  • 批准号:
    31100958
  • 批准年份:
    2011
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
用“后合成核磁共振分析”(retrobiosynthetic NMR analysis)技术阐明青蒿素生物合成途径
  • 批准号:
    30470153
  • 批准年份:
    2004
  • 资助金额:
    22.0 万元
  • 项目类别:
    面上项目

相似海外基金

Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    7969213
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9160910
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6111075
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    6988458
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6432755
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
  • 批准号:
    6554459
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    9362440
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    8943219
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    10927035
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
  • 批准号:
    8149599
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了