Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
基本信息
- 批准号:10007522
- 负责人:
- 金额:$ 263.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AffectArchaeaBacteriaBiologicalCapsidClustered Regularly Interspaced Short Palindromic RepeatsComplexComputing MethodologiesConflict (Psychology)ConsensusDNA Transposable ElementsDataDatabasesEarly DiagnosisElementsEukaryotaEvolutionGenesGenomeGenomicsGoalsGrowthGuide RNAHorizontal Gene TransferIndividualInvestigationLearningLifeMalignant NeoplasmsMediatingMetagenomicsMethodologyMethodsMicrosatellite RepeatsMobile Genetic ElementsMorphogenesisMutationNeoplasm MetastasisNormal tissue morphologyOrganismOrthologous GeneParasitesPatientsPatternPhylogenyPlanet EarthPlasmidsPoint MutationPrimary NeoplasmProbabilityProcessPropertyProphagesProteinsProteomeProteomicsRepetitive SequenceRepliconResearchRoleSeriesSignal TransductionSomatic MutationStructureSystemTimeTimeLineTumor TissueVariantViral GenomeVirionVirusaccurate diagnosisbasecancer genomecancer genomicsclinical decision-makingcomparativegenetic elementgenome-widegenomic signatureimprovedinsightlong short term memorylong short term memory networkmathematical methodsmathematical modelnoveloutcome forecastparalogous geneprotein expressionproteomic signaturerecruitrecurrent neural networkreplicatorsimulationtheoriestrendtumortumor progressiontumorigenicwhole genome
项目摘要
The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the year in review, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution.
To a large extent, we have focused on cancer genome evolution. Cancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ long short-term memory (LSTM) networks, a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and reveals hitherto unknown aspects of cancer evolution.
In another cancer genomics project, we explored proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection.
Additionally, we have continued intensive research into evolutionary genomics of viruses and antivirus defense systems. In particular, we carried out a detailed investigation of CRISPR-Cas systems encoded in mobile genetic elements and involved in counter-defence and other functions. The principal function of CRISPR-Cas systems in archaea and bacteria is defence against mobile genetic elements (MGEs), including viruses, plasmids and transposons. However, the relationships between CRISPR-Cas and MGEs are far more complex. Several classes of MGE contributed to the origin and evolution of CRISPR-Cas, and, conversely, CRISPR-Cas systems and their components were recruited by various MGEs for functions that remain largely uncharacterized. We investigated and substantially expanded the range of CRISPR-Cas components carried by MGEs. Three groups of Tn7-like transposable elements encode 'minimal' type I CRISPR-Cas derivatives capable of target recognition but not cleavage, and another group encodes an inactivated type V variant. These partially inactivated CRISPR-Cas variants might mediate guide RNA-dependent integration of the respective transposons. Numerous plasmids and some prophages encode type IV systems, with similar predicted properties, that appear to contribute to competition among plasmids and between plasmids and viruses. Many prokaryotic viruses also carry CRISPR mini-arrays, some of which recognize other viruses and are implicated in inter-virus conflicts, and solitary repeat units, which could inhibit host CRISPR-Cas systems.
We also have developed a general theory of the origin of viruses from primordial replicators that various cellular proteins as capsid formation. Viruses are ubiquitous parasites of cellular life and the most abundant biological entities on Earth. It is widely accepted that viruses are polyphyletic, but a consensus scenario for their ultimate origin is still lacking. Traditionally, three scenarios for the origin of viruses have been considered: descent from primordial, precellular genetic elements, reductive evolution from cellular ancestors and escape of genes from cellular hosts, achieving partial replicative autonomy and becoming parasitic genetic elements. These classical scenarios give different timelines for the origin(s) of viruses and do not explain the provenance of the two key functional modules that are responsible, respectively, for viral genome replication and virion morphogenesis. We developed a 'chimeric' scenario under which different types of primordial, selfish replicons gave rise to viruses by recruiting host proteins for virion formation. We also propose that new groups of viruses have repeatedly emerged at all stages of the evolution of life, often through the displacement of ancestral structural and genome replication genes.
Taken together, these studies advance the existing understanding of the general principles and specific aspects of genome evolution in diverse life forms, in particular, viruses and mobile elements, as well as cancer genome evolution.
细菌、古生菌、真核生物和病毒基因组完全和几乎完全排序的数据库迅速增长(已有数千个基因组,还有更多正在进行中),为基因组研究创造了广泛的新机会和重大的新挑战。在回顾的一年中,我们进行了各种研究,利用基因组信息建立基因组进化的基本原理。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eugene V Koonin其他文献
The common ancestry of life
- DOI:
10.1186/1745-6150-5-64 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:4.900
- 作者:
Eugene V Koonin;Yuri I Wolf - 通讯作者:
Yuri I Wolf
Identification of dephospho-CoA kinase in Thermococcus kodakarensis and the complete CoA biosynthesis pathway
Thermococcus kodakarensis 中去磷酸 CoA 激酶的鉴定及完整 CoA 生物合成途径
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Takahiro Shimosaka;Kira S Makarova;Eugene V Koonin;Haruyuki Atomi - 通讯作者:
Haruyuki Atomi
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
积极且强烈放松的纯化选择驱动蛋白质中重复序列的进化
- DOI:
10.1038/ncomms13570 - 发表时间:
2016-11-18 - 期刊:
- 影响因子:15.700
- 作者:
Erez Persi;Yuri I. Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Evolutionary primacy of sodium bioenergetics
- DOI:
10.1186/1745-6150-3-13 - 发表时间:
2008-04-01 - 期刊:
- 影响因子:4.900
- 作者:
Armen Y Mulkidjanian;Michael Y Galperin;Kira S Makarova;Yuri I Wolf;Eugene V Koonin - 通讯作者:
Eugene V Koonin
Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52
- DOI:
10.1186/1471-2164-3-8 - 发表时间:
2002-03-21 - 期刊:
- 影响因子:3.700
- 作者:
Lakshminarayan M Iyer;Eugene V Koonin;L Aravind - 通讯作者:
L Aravind
Eugene V Koonin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eugene V Koonin', 18)}}的其他基金
Finding Protein Sequence Motifs--methods And Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6681337 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Finding Protein Sequence Motifs--Methods and Application
寻找蛋白质序列基序--方法与应用
- 批准号:
6988455 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
7969213 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
8943217 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
9160910 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7735068 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
7594460 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Finding Protein Sequence Motifs--methods And Applications
寻找蛋白质序列基序——方法和应用
- 批准号:
9555730 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
全测序基因组的比较分析
- 批准号:
6111075 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
Comparative Analysis Of Completely Sequenced Genomes
完全测序的基因组的比较分析
- 批准号:
6988458 - 财政年份:
- 资助金额:
$ 263.82万 - 项目类别:
相似海外基金
Innovative strategy for cultivation of CPR bacteria and DPANN archaea
CPR 细菌和 DPANN 古菌培养的创新策略
- 批准号:
22K19141 - 财政年份:2022
- 资助金额:
$ 263.82万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
CAREER: Elucidating the Interaction(s) Between Bacteria and Archaea in a Biocathode
职业:阐明生物阴极中细菌和古细菌之间的相互作用
- 批准号:
2145902 - 财政年份:2022
- 资助金额:
$ 263.82万 - 项目类别:
Continuing Grant
Development of a novel co-culture model of NH4-tolerant propionate-oxidizing bacteria and methane-producing archaea for the recovery of CH4 under high NH4+
开发耐 NH4 丙酸氧化细菌和产甲烷古菌的新型共培养模型,用于在高 NH4 条件下回收 CH4
- 批准号:
22K12428 - 财政年份:2022
- 资助金额:
$ 263.82万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Identification of ammonia oxidizing bacteria and archaea within industrial wastewater treatment system bioreactors
工业废水处理系统生物反应器内氨氧化细菌和古细菌的鉴定
- 批准号:
510279-2017 - 财政年份:2017
- 资助金额:
$ 263.82万 - 项目类别:
Engage Grants Program
Borrowing building blocks from bacteria and eukaryotes: a three-component DNA segregation machinery in archaea
借用细菌和真核生物的构建模块:古细菌中的三组分 DNA 分离机制
- 批准号:
1949055 - 财政年份:2017
- 资助金额:
$ 263.82万 - 项目类别:
Studentship
Borrowing building blocks from bacteria and eukaryotes: a three-component DNA segregation machinery in archaea
借用细菌和真核生物的构建模块:古细菌中的三组分 DNA 分离机制
- 批准号:
1947068 - 财政年份:2017
- 资助金额:
$ 263.82万 - 项目类别:
Studentship
Understanding Horizontal Gene Transfer in Bacteria and Archaea: Units of Transfer and Modes of Integration
了解细菌和古细菌中的水平基因转移:转移单位和整合模式
- 批准号:
1616514 - 财政年份:2016
- 资助金额:
$ 263.82万 - 项目类别:
Continuing Grant
Metabolic Syntrophy Between Human Gut Bacteria and Archaea
人类肠道细菌和古细菌之间的代谢协同
- 批准号:
10016363 - 财政年份:2016
- 资助金额:
$ 263.82万 - 项目类别:
A study on new technology for treating sulfate-containing wastewater by combined control of anaerobic methanogenic archaea and sulfur metabolic bacteria
厌氧产甲烷古菌与硫代谢菌联合控制处理含硫酸盐废水新技术研究
- 批准号:
26550056 - 财政年份:2014
- 资助金额:
$ 263.82万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Evolvability of thermophilic proteins from archaea and bacteria
古细菌和细菌的嗜热蛋白的进化性
- 批准号:
25440194 - 财政年份:2013
- 资助金额:
$ 263.82万 - 项目类别:
Grant-in-Aid for Scientific Research (C)