权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

COMPUTER ANALYSIS OF LOW-COMPLEXITY AMINO ACID AND NUCLEOTIDE SEQUENCES

低复杂性氨基酸和核苷酸序列的计算机分析

基本信息

批准号：
6111060
负责人：
JOHN C. WOOTTON
金额：
--
依托单位：
NATIONAL LIBRARY OF MEDICINE
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/6111060
关键词：
chemical information system computer assisted sequence analysis computer data analysis computer system design /evaluation nucleic acid repetitive sequence nucleic acid sequence protein sequence protein structure function statistics /biometry

项目摘要

The goal of this project is to define, classify and analyze, using computational analysis, segments of protein and nucleotide sequences showing compositional bias or improbably low compositional complexity. In protein sequences, these include the abundant residue clusters of predominantly one or a few amino acid types, which commonly contain homopolymeric tracts or mosaics of these, aperiodic patterns and sections of low-period repeats. Other common examples include long non-glomerular domains. The abundance of biased segments in both amino acid and nucleotide sequence databases has been determined, and their properties are being related to evidence of biological functions. Different formal definitions of local compositional complexity were used to make unbiased identification of low-complexity segments, at different levels of stringency. Algorithms were refined to (a) select segments for further study, (b) filter out non-informative segments prior to database searches, and c) discover and analyze regions in which compositional bias is present in periodically-spaced rather than contiguous residues. New methods for automated classification and neighboring of low-complexity sequences have been developed. B. Abundance and biological properties: Approximately 25% of the residues in protein databases are in compositionally biased segments (including some known long non-globular regions) and approximately 55% of proteins contain one or more such segments. Interspersed low-complexity sequences are particularly abundant in many segments. Interspersed low-complexity sequences are particularly abundant to many eukaryotic proteins crucial in morphogenesis and embryonic development, RNA processing, transcriptional regulation, signal transduction and aspects of cellular and extracellular structural integrity. The limited structural information available for low-complexity regions of proteins indicates that they are generally non-globular and polymorphic or mobile. The project is highlighting the high abundance and biological importance of low-complexity protein segments. Knowledge of their molecular structure and dynamics is beginning to emerge in a few cases, but these are a minority. This is a priority area for future research. The methods recently developed to analyze nucleotide sequences are revealing many new and intricate compositional features. These methods are valuable in eliminating many artifacts in sequence database searches and alignment analysis.

该项目的目标是定义、分类和使用计算分析来分析蛋白质片段和显示组成偏差的核苷酸序列，或成分复杂性低得不可思议。的蛋白质序列，这些包括丰富的残基簇主要是一种或几种氨基酸类型，其通常含有这些的均聚物的片段或镶嵌物，非周期性的低周期重复的模式和部分。其他常见实例包括长的非肾小球结构域。的在氨基酸和氨基酸序列中的偏向片段的丰度已经确定了核苷酸序列数据库，他们的财产与生物功能。局部成分的不同形式定义复杂性被用来进行公正的鉴定，低复杂度的部分，在不同的严格程度。算法被改进为（a）选择片段以用于进一步的研究，（B）在数据库搜索，以及c）发现和分析这种成分偏差存在于而不是连续的残留物。自动化的新方法低复杂度序列的分类与邻接已经被开发出来。 B。生物学特性：约25% 在蛋白质数据库中，偏置段（包括一些已知的长非球形大约55%的蛋白质含有一个或多个区域，更多这样的片段散布低复杂度序列在许多环节中特别丰富。穿插低复杂度序列对于许多真核生物蛋白质在形态发生和胚胎发育中起关键作用发育，RNA加工，转录调控，信号转导和细胞和细胞外结构完整性有限的结构低复杂度区域的可用信息蛋白质表明它们通常是非球形的，多态的或移动的。该项目突出了高丰度和低复杂性蛋白质片段的生物学重要性。了解它们的分子结构和动力学，开始出现在少数情况下，但这些都是一个少数这是未来研究的优先领域。近年来发展起来的核苷酸分析方法序列揭示了许多新的和复杂的组成特征。这些方法在以下方面很有价值：消除了序列数据库搜索中的许多伪像和对齐分析。