权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

COMPUTER ANALYSIS OF LOW-COMPLEXITY AMINO ACID AND NUCLEOTIDE SEQUENCES

低复杂性氨基酸和核苷酸序列的计算机分析

基本信息

批准号：
6290481
负责人：
JOHN C. WOOTTON
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6290481
关键词：
chemical information system computer assisted sequence analysis computer data analysis computer system design /evaluation nucleic acid repetitive sequence nucleic acid sequence protein sequence protein structure function statistics /biometry

项目摘要

The goal of this project is to define and analyze, using computational methods, segments of protein and nucleotide sequences showing compositional bias(low-complexity regions or domains) and to understand their structural, functional and evolutionary significance, and their pathology. In protein sequences, these regions comprise a large proportion of the genome encoded amino acids (approximately 25%in most eukaryotes, and most of the translated protein sequences contain at least one such region). They may contain homopolymeric tracts or mosaics of a few amino acids, or repeated patterns, frequently subtle, including those typical of many non-globular domains. New mathematical definitions and algorithms are continuing to be developed to make unbiased identification of low-complexity segments, and to discover and analyze properties of these regions relevant to their structures, interactions and biological functions. Interspersed low-complexity sequences are particularly abundant in many eukaryotic proteins crucial in morphogenesis and embryonic development, RNA processing, transcriptional regulation, signal transduction and aspects of cellular and extracellular structural integrity. Structural data indicate that low complexity segments of proteins are generally non-globular or conformationally mobile. However, knowledge of the molecular structures and dynamics of these domains is still very limited because they are generally relatively intractable to investigation by crystallography and NMR, and they account for less than 1% of the residues in current structural databases. Hence, mathematically rigorous sequence analysis provides a primary methodology for gaining insights into their biology, and for raising questions to be investigated expermentally. These methods are also valuable, for both nucleotide and amino acid sequences, in detecting and eliminating some artifacts in sequence database searches and alignment analysis. - Computer algorithms, protein sequences, protein complexes, domains, complexity, patterns, repeats, non-globular structure, conformational mobility

该项目的目标是使用计算方法定义和分析显示组成偏见(低复杂性区域或结构域)的蛋白质和核苷酸序列片段，并了解它们的结构、功能和进化意义及其病理学。在蛋白质序列中，这些区域包含很大比例的基因组编码的氨基酸(在大多数真核生物中约为25%，大多数翻译的蛋白质序列至少包含一个这样的区域)。它们可能包含几个氨基酸的同质多聚区或镶嵌体，或者重复的模式，通常是微妙的，包括那些典型的许多非球形结构域。正在继续开发新的数学定义和算法，以无偏见地识别低复杂性片段，并发现和分析这些区域与其结构、相互作用和生物功能相关的属性。穿插的低复杂性序列在许多真核蛋白质中尤其丰富，这些蛋白质在形态发生和胚胎发育、RNA加工、转录调控、信号转导以及细胞和细胞外结构完整性方面都是至关重要的。结构数据表明，低复杂性的蛋白质片段通常是非球形的或构象可移动的。然而，对这些结构域的分子结构和动力学的了解仍然非常有限，因为它们通常相对难以通过结晶学和核磁共振进行研究，而且它们在当前结构数据库中所占残基的比例不到1%。因此，数学上严格的序列分析为深入了解它们的生物学和提出需要进行实验研究的问题提供了一种主要的方法。对于核苷酸和氨基酸序列，这些方法在检测和消除序列数据库搜索和比对分析中的某些伪影方面也是有价值的。-计算机算法、蛋白质序列、蛋白质复合体、结构域、复杂性、模式、重复、非球状结构、构象流动性