Detecting Homology in the "Twilight Zone" of Sequence Similarity
检测序列相似性“暮光区”的同源性
基本信息
- 批准号:7799248
- 负责人:
- 金额:$ 14.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-04-10 至 2011-02-01
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAmino Acid SequenceAreaBenchmarkingBlast CellCase StudyCharacteristicsComplexComputational TechniqueComputersDataData SetDatabasesDetectionDevelopmentDiseaseEvolutionExplosionFamilyFingerprintGrantIntronsLaboratoriesLengthLocationManualsMapsMeasurementMeasuresMethodsModelingPeptide Sequence DeterminationPerformancePhylogenetic AnalysisPlant RootsPlasmidsPolymeraseProcessProtein EngineeringProteinsRNA VirusesRNA-Directed RNA PolymeraseRecording of previous eventsRegulatory ElementReportingResearchResolutionResourcesRetroelementsScientistSequence AlignmentSet proteinSpeedStructural ModelsStructureTranslational ResearchTreesViralVirusWorkarmbaseclinically relevantcombatdomain mappingfallsinsightknowledge basepharmacophoreprotein structurepublic health relevanceresearch studysimulationtherapy developmenttooluser-friendlyviral RNA
项目摘要
DESCRIPTION (provided by applicant): The `protein problem' has remained unsolved despite decades of research [1, 2]. In principle, one expects that the primary amino acid sequence of a protein determines its structure, function, and evolutionary (SF&E) characteristics. Yet, there still is no reliable method for predicting the native state structure of a protein and its function given only its sequence. In addition, inferring the evolutionary relationships among highly divergent protein sequences is a daunting task. In general, when pairwise sequence alignments between protein sequences fall below 25% identity, statistical measurements do not provide support robust enough to identify clear phylogenetic relationships despite intensive research in this area [1, 3, 4]. The recent explosion in the availability of knowledge bases and computational techniques for the analysis of complex data has created an unprecedented opportunity for teasing out invaluable information from protein sequences. Starting with a basic premise that protein sequence encodes information about SF&E, we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach in which we measure the similarity between a query sequence and a set of biologically relevant profiles in an unbiased manner. Results from this Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provide phylogenetic profiles that have the capacity to model SF&E relationships of various proteins. Indeed, GDDA-BLAST is capable of deriving deep phylogenetic relationships for highly divergent proteins in a quantifiable manner [5, 6]. Preliminary results from our computational case study of the highly divergent family of retroelements accord with those previously reported, and demonstrate that GDDA-BLAST measurements can be treated as "fingerprints" that can be used to derive distance estimates and hence phylogenetic relationships without prior information, multiple sequence alignment, or manual editing. We propose that sequence information present within the "twilight zone" of sequence similarity can provide key insight into SF&E relationships among distantly related and/or rapidly evolving proteins. This proposal aims to push our limits of detecting homology within the "twilight zone" of sequence similarity by evaluating and optimizing GDDA-BLAST performance on benchmark and experimental data sets. Armed with these refined GDDA- BLAST measurements we propose to conduct a comprehensive, ab initio, phylogenetic study of retroelements and RNA dependent RNA polymerases from the positive-strand family of RNA viruses (+ssRNA). Simultaneously we will derive high-resolution maps of domain boundaries and empirically validate functional annotations and predictions of key residues for those activities. This work aims to perform translational research from the computer to the laboratory bench top. We expect that the tools and resources generated from this grant will be accessible and user-friendly to the bench scientist, thereby speeding the discovery process of other clinically relevant research endeavors. PUBLIC HEALTH RELEVANCE: The long-term implication of this proposal is the development of a unified framework for high-resolution and simultaneous measurements of structure, function, and evolution. Should this be possible: (i) functional and evolutionary measurements could quantitatively inform structural modeling to derive accurate atomic resolution protein structures, (ii) structural and functional measurements could inform evolutionary histories to derive accurate evolutionary rates, deep-branch relationships, and homologous spaces within each protein, and (iii) structural and evolutionary measures would inform as to the location of functionalities contained within any protein and the regulatory elements which control these functions. Armed with this information, the speeds at which diseases could be understood and pharmacophores/therapies developed to combat them would likely increase dramatically.
描述(申请人提供):经过几十年的研究,“蛋白质问题”仍然没有得到解决[1,2]。原则上,人们预计蛋白质的主要氨基酸序列决定其结构、功能和进化(SF&E)特征。然而,仍然没有可靠的方法来预测蛋白质的天然状态结构和功能,只给出它的序列。此外,推断高度分化的蛋白质序列之间的进化关系是一项艰巨的任务。一般来说,当蛋白质序列之间的成对序列比对低于25%的同源性时,统计测量不能提供足够强大的支持来识别明确的系统发育关系,尽管在这一领域进行了大量的研究[1,3,4]。最近,用于分析复杂数据的知识库和计算技术的爆炸性增长,为从蛋白质序列中梳理出宝贵的信息创造了前所未有的机会。从蛋白质序列编码SF&E信息的基本前提出发,我们开发了一个统一的框架,利用基于知识的方法从序列信息中推断SF&E,其中我们以无偏见的方式衡量查询序列和一组生物相关特征之间的相似性。格式塔结构域检测算法-基本局部比对工具(GDDA-BLAST)的结果提供了具有模拟各种蛋白质的SF&E关系的系统发育图谱。事实上,GDDA-BLAST能够以可量化的方式为高度不同的蛋白质得出深刻的系统发育关系[5,6]。我们对高度分化的逆转录元件家族的计算案例研究的初步结果与先前报道的结果一致,并表明GDDA-BLAST测量可被视为“指纹”,可用于推导距离估计,从而在没有先验信息、多序列比对或人工编辑的情况下得出系统发育关系。我们认为,存在于序列相似性的“暮光区”中的序列信息可以提供对远亲和/或快速进化的蛋白质之间的SF&E关系的关键洞察。该建议旨在通过评估和优化GDDA-BLAST在基准数据集和实验数据集上的性能,来提高我们在序列相似性的“暮光地带”内检测同源性的极限。有了这些精致的GDDA-BLAST测量,我们建议对正链RNA病毒(+ssRNA)家族的逆转录元件和依赖RNA的RNA聚合酶进行全面的从头算系统发育研究。同时,我们将得到区域边界的高分辨率地图,并经验地验证功能注释和对这些活动的关键残基的预测。这项工作旨在进行从计算机到实验室工作台的翻译研究。我们预计,从这笔赠款产生的工具和资源将对实验室科学家来说是可用的和用户友好的,从而加快其他临床相关研究工作的发现进程。公共卫生相关性:这项提议的长期影响是为结构、功能和进化的高分辨率和同时测量发展一个统一的框架。如果这是可能的:(I)功能和进化测量可以定量地为结构建模提供信息,以获得准确的原子分辨率蛋白质结构,(Ii)结构和功能测量可以为进化历史提供信息,以获得准确的进化速率、深分支关系和每个蛋白质中的同源空间,以及(Iii)结构和进化测量将告知任何蛋白质中所包含的功能的位置和控制这些功能的调控元件。有了这些信息,人们了解疾病和开发抗击疾病的药团/疗法的速度可能会大幅提高。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
RANDEN LEE PATTERSON其他文献
RANDEN LEE PATTERSON的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('RANDEN LEE PATTERSON', 18)}}的其他基金
Detecting Homology in the "Twilight Zone" of Sequence Similarity
检测序列相似性“暮光区”的同源性
- 批准号:
8243153 - 财政年份:2009
- 资助金额:
$ 14.16万 - 项目类别:
Detecting Homology in the "Twilight Zone" of Sequence Similarity
检测序列相似性“暮光区”的同源性
- 批准号:
8055951 - 财政年份:2009
- 资助金额:
$ 14.16万 - 项目类别:
Detecting Homology in the "Twilight Zone" of Sequence Similarity
检测序列相似性“暮光区”的同源性
- 批准号:
8288082 - 财政年份:2009
- 资助金额:
$ 14.16万 - 项目类别:
The Identity/Role of IP3 Receptor Associated Proteins
IP3 受体相关蛋白的身份/作用
- 批准号:
6446438 - 财政年份:2002
- 资助金额:
$ 14.16万 - 项目类别:
The Identity/Role of IP3 Receptor Associated Proteins
IP3 受体相关蛋白的身份/作用
- 批准号:
6660382 - 财政年份:2002
- 资助金额:
$ 14.16万 - 项目类别:
The Identity/Role of IP3 Receptor Associated Proteins
IP3 受体相关蛋白的身份/作用
- 批准号:
6643309 - 财政年份:2002
- 资助金额:
$ 14.16万 - 项目类别:
相似海外基金
Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
- 批准号:
23K06972 - 财政年份:2023
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
- 批准号:
23KJ0939 - 财政年份:2023
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
- 批准号:
19K07013 - 财政年份:2019
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
- 批准号:
16K05820 - 财政年份:2016
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
- 批准号:
16K08426 - 财政年份:2016
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
- 批准号:
25461010 - 财政年份:2013
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
- 批准号:
23654177 - 财政年份:2011
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
- 批准号:
22550105 - 财政年份:2010
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
- 批准号:
NE/D00232X/1 - 财政年份:2006
- 资助金额:
$ 14.16万 - 项目类别:
Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
- 批准号:
16500189 - 财政年份:2004
- 资助金额:
$ 14.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)