Computer Analysis Of Low-complexity Amino Acid And Nucleotide Sequences
低复杂性氨基酸和核苷酸序列的计算机分析
基本信息
- 批准号:8149593
- 负责人:
- 金额:$ 29.38万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
We are investigating segments of protein and nucleotide sequences that show compositional bias and raise several challenges for computational analysis. We develop methods to help to understand the structural, functional and evolutionary significance of these regions and their pathology. The sequences include local low complexity regions or domains, including conformationally mobile or intrinsically unstructured regions of proteins, and tandemly-repeated sequences. Further problems arise from more generally distributed amino acid content biases that can reflect directional mutation pressures at the genomic level and constraints specific to protein or domain function.
Low complexity regions comprise a large proportion of the genome-encoded amino acids, and may contain homopolymeric tracts or mosaics of a few amino acids, or repeated patterns, frequently subtle, including those typical of many non-globular domains and dynamic or intrinsically unstructured segments of proteins. We have developed mathematical definitions and algorithms to define and identify regions of compositional bias, and to discover and analyze properties of these regions relevant to their structures, interactions, and evolution. Local regions of low complexity and tandemly repeated amino acid sequences occur in many proteins involved in cellular differentiation and embryonic development, RNA processing, transcriptional regulation, signal transduction and aspects of cellular and extracellular structural integrity. Segments of proteins are commonly non-globular, intrinsically unstructured, or conformationally mobile: however, knowledge of the molecular structures and dynamics of these domains is still very limited. They are generally relatively intractable to investigation by crystallography and NMR, and they still account for less than 1% of the residues in 3-dimensional structural databases. Current computer methods based on molecular mechanics and dynamics have given inconsistent results when applied to low-complexity amino acid sequences. Accordingly, we are experimenting with ab initio quantum chemical methods to investigate the ensembles of conformational states accessible to these regions of proteins. As specific examples to motivate this development, we are investigating amino acid sequence repeats of malaria parasites with possible roles in immune evasion as components of malaria vaccines.
A related problem is compositional bias that can affect not only local segments but is distributed generally over the entire genome or proteins of an organism. This is shown, for example, by the biases in codons or proteins encoded by very AT-rich or GC-rich genomes including those of several important infectious disease organisms. Such variation and bias in genome-wide amino acid and nucleotide compositions raise problems for several commonly used sequence analysis algorithms. Accordingly, current research with Stephen Altschul and Yi-Kuo Yu is developing the theoretical foundation and implementation of these algorithms further in ways that include an improved treatment of background frequencies.
我们正在研究蛋白质和核苷酸序列的片段,这些片段显示出成分偏差,并为计算分析提出了一些挑战。 我们开发的方法,以帮助了解这些地区及其病理的结构,功能和进化意义。 所述序列包括局部低复杂性区域或结构域,包括蛋白质的构象移动的或固有非结构化区域,以及串联重复序列。 进一步的问题来自更普遍分布的氨基酸含量偏差,其可以反映基因组水平上的定向突变压力和对蛋白质或结构域功能的特定约束。
低复杂性区域包含大比例的基因组编码的氨基酸,并且可以包含几个氨基酸的均聚物片段或镶嵌物,或重复的模式,通常是细微的,包括许多非球状结构域和蛋白质的动态或固有非结构化片段的典型模式。我们已经开发了数学定义和算法来定义和识别成分偏差的区域,并发现和分析这些区域的结构,相互作用和演化相关的属性。低复杂性和串联重复的氨基酸序列的局部区域出现在参与细胞分化和胚胎发育、RNA加工、转录调节、信号转导以及细胞和细胞外结构完整性方面的许多蛋白质中。蛋白质的片段通常是非球形的,本质上是非结构化的,或构象移动的:然而,这些结构域的分子结构和动力学的知识仍然非常有限。它们通常相对难以通过晶体学和NMR进行研究,并且它们在三维结构数据库中仍然占不到1%的残基。 目前基于分子力学和动力学的计算机方法在应用于低复杂度氨基酸序列时给出了不一致的结果。 因此,我们正在实验从头量子化学方法来研究这些蛋白质区域的构象状态。 作为具体的例子,以推动这一发展,我们正在调查的疟疾寄生虫的氨基酸序列重复可能的作用,免疫逃避疟疾疫苗的组成部分。
一个相关的问题是组成偏差,它不仅影响局部片段,而且通常分布在生物体的整个基因组或蛋白质中。 例如,这通过由非常富含AT或富含GC的基因组编码的密码子或蛋白质中的偏好来显示,包括几种重要的传染病生物体的基因组。 全基因组氨基酸和核苷酸组成的这种变异和偏差为几种常用的序列分析算法提出了问题。 因此,Stephen Altschul和Yi-Kuo Yu目前的研究正在进一步发展这些算法的理论基础和实现,包括改进背景频率的处理。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The construction and use of log-odds substitution scores for multiple sequence alignment.
- DOI:10.1371/journal.pcbi.1000852
- 发表时间:2010-07-15
- 期刊:
- 影响因子:4.3
- 作者:Altschul SF;Wootton JC;Zaslavsky E;Yu YK
- 通讯作者:Yu YK
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
JOHN C. WOOTTON其他文献
JOHN C. WOOTTON的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('JOHN C. WOOTTON', 18)}}的其他基金
Computational Biology and Genetics Of Malaria Parasites
疟疾寄生虫的计算生物学和遗传学
- 批准号:
6681329 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computational Biology and Genetics Of Malaria and Toxoplasma Parasites
疟疾和弓形虫寄生虫的计算生物学和遗传学
- 批准号:
7969203 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computational Biology and Genetics Of Malaria and Toxopl
疟疾和弓形虫的计算生物学和遗传学
- 批准号:
7316231 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computational Biology and Genetics Of Malaria Parasites
疟疾寄生虫的计算生物学和遗传学
- 批准号:
6988451 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computer Analysis Of Low-complexity Amino Acid And Nucle
低复杂性氨基酸和核酸的计算机分析
- 批准号:
7316230 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computational Biology and Genetics Of Malaria Parasites
疟疾寄生虫的计算生物学和遗传学
- 批准号:
6843563 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computer Analysis Of Low-complexity Amino Acid And Nucleotide Sequences
低复杂性氨基酸和核苷酸序列的计算机分析
- 批准号:
7735065 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Analysis-Low-complexity Amino Acid-Nucleotide Sequences
低复杂性氨基酸-核苷酸序列分析
- 批准号:
7148025 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
Computer Analysis Of Low-complexity Amino Acid And Nucleotide Sequences
低复杂性氨基酸和核苷酸序列的计算机分析
- 批准号:
7594457 - 财政年份:
- 资助金额:
$ 29.38万 - 项目类别:
相似海外基金
Quantum chemical challenge to elucidate the functional mechanism of base sequence specificity deciding removal of the DNA damage
量子化学挑战阐明碱基序列特异性决定去除 DNA 损伤的功能机制
- 批准号:
19K22903 - 财政年份:2019
- 资助金额:
$ 29.38万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Theoretical Study on Relation of Base sequence and Electronic Structures toward Elucidation of Mechanism of DNA Electric Conductivity.
碱基序列与电子结构关系的理论研究,阐明DNA导电机制。
- 批准号:
16K05666 - 财政年份:2016
- 资助金额:
$ 29.38万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Prediction and control of base sequence recognition ability for nucleic acid binding proteins by using computer experiments.
利用计算机实验预测和控制核酸结合蛋白的碱基序列识别能力。
- 批准号:
14598001 - 财政年份:2002
- 资助金额:
$ 29.38万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
FLANKING BASE SEQUENCE ON MUTAGENICITY OF 8 OXOGUANINE
8 氧鸟嘌呤致突变性的侧翼碱基序列
- 批准号:
6362773 - 财政年份:2001
- 资助金额:
$ 29.38万 - 项目类别:
FLANKING BASE SEQUENCE ON MUTAGENICITY OF 8 OXOGUANINE
8 氧鸟嘌呤致突变性的侧翼碱基序列
- 批准号:
6137753 - 财政年份:2000
- 资助金额:
$ 29.38万 - 项目类别:
GROWTH HOROMON LOCALIZATION AND ITS BASE SEQUENCE IN BOVINE PANCREATIC
牛胰腺生长激素定位及其碱基序列
- 批准号:
10460134 - 财政年份:1998
- 资助金额:
$ 29.38万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
2488608 - 财政年份:1997
- 资助金额:
$ 29.38万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6475917 - 财政年份:1997
- 资助金额:
$ 29.38万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6329024 - 财政年份:1997
- 资助金额:
$ 29.38万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6124462 - 财政年份:1997
- 资助金额:
$ 29.38万 - 项目类别: