Advanced correlation analyses to infer sequence and structural determinants of protein function
先进的相关性分析可推断蛋白质功能的序列和结构决定因素
基本信息
- 批准号:10093067
- 负责人:
- 金额:$ 30.9万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-02-01 至 2023-01-31
- 项目状态:已结题
- 来源:
- 关键词:AcetyltransferaseBase SequenceBayesian AnalysisBenchmarkingBiochemicalBiologicalBiomedical ResearchCaringCatalysisCodeCollaborationsCommon CoreCorrelation StudiesCouplingDNA Repair EndonucleaseDataData SetDatabasesDependenceDimerizationDrug DesignEnsureEvaluationFeedbackFormulationGoalsGuanosine Triphosphate PhosphohydrolasesHealthHumanHydrogen BondingIndividualInvestigationJointsLinkMeasuresMediatingMethodsMindModelingMolecularMolecular BiologyPatternPerformancePhosphoric Monoester HydrolasesPositioning AttributeProcessPropertyProtein EngineeringProteinsQuality ControlReliability of ResultsResearchResearch PersonnelRoleSamplingSensitivity and SpecificitySequence AlignmentSpecificitySpeedStatistical ModelsStructureSubgroupSystemTestingTimeValidationbasefollow-uphuman diseaseimprovedinnovationinositol-1,4,5-trisphosphate 5-phosphataseinsightinterestmemberopen sourcepersonalized medicineprogramsprotein functionthree dimensional structuretool
项目摘要
PROJECT SUMMARY
A long-term goal of molecular biology is assigning functional and mechanistic roles to specific protein residues,
beyond the obvious roles in catalysis. Although this task is hindered by the relative sparsity of experimentally-
based sequence annotations, it is facilitated by an abundance of sequence data augmented by structural data.
This has spurred sequence- and structure-based prediction of function determining residues using a wide
variety of methods. However, by focusing on experimentally characterized functions, these methods disfavor
recognition of residues involved in important uncharacterized functions, insofar as these will be benchmarked
incorrectly as false positives. Instead, this project focuses more generally on inferring functionally-relevant
residues (FRRs) by allowing the sequence data itself to reveal its most statistically surprising properties
without making assumptions about what will be found. We argue that, in the absence of experimental
annotations, it is only possible to directly link individual residues to other residues and such residue sets to
structural features. This project will make such associations by identifying sequence-to-sequence and
sequence-to-structure correlations, and will focus solely on the observed data rather than on predicting
(unseen) biochemical properties. The goal is to obtain hypothesis-generating observations for experimental
follow up. Aim 1 will create advanced tools for characterizing correlated residue patterns due to functional
divergence with each pattern consisting of an arbitrary number of residues. Aim 2 will develop a tool to
probabilistically assess correlations between independent sequence- and structurally-defined residue sets.
This tool will be modified for other purposes, including the evaluation of FRR-prediction programs. Aim 3 will
integrate Aims 1 & 2 methods and direct coupling analysis (DCA) into a nearly comprehensive system for
sequence/structural correlation analysis. (Unlike the correlations under Aims 1 & 2, DCA focuses on direct
correlations between residue pairs.) This strategy involves a high degree of model complexity and optimization
over diverse sequence properties synergistically (due to interrelationships and dependencies) and over
alternative models and parameters; hence, considerable care is required to ensure reliable results. Therefore,
we will apply information theoretical principles to adjust accurately for multiple hypotheses, to avoid under- and
over-fitting to the data, and to eliminate inherent biases. Aim 3 will also characterize the relationships among
the various types of correlations. We will apply these tools to large, functionally diverse superfamilies in
collaboration with researchers interested in these proteins. Using tools developed under Aim 2 and hundreds
of conserved domain datasets, Aim 4 will rigorously benchmark the performance of tools developed under
Aims 1 & 3 relative to competing methods. This project will aid research efforts in protein engineering, the
molecular basis of human disease, drug design and personalized medicine.
项目摘要
分子生物学的一个长期目标是为特定的蛋白质残基分配功能和机制作用,
除了催化作用之外虽然这项任务受到实验相对稀疏的阻碍-
基于序列注释,通过结构数据扩充的丰富序列数据促进了该方法。
这刺激了基于序列和结构的预测功能决定残基使用广泛的
多种方法。然而,由于专注于实验表征的函数,这些方法不利于
识别涉及重要的未表征功能的残基,只要这些功能将被基准化
误认为是假阳性。相反,这个项目更普遍地侧重于推断功能相关
通过允许序列数据本身揭示其最令人惊讶的统计特性,
我们认为,在没有实验证据的情况下,
注释时,仅可能将单个残基直接连接到其他残基,并且这样的残基组
结构特征本项目将通过识别序列间的关联,
序列与结构的相关性,并将只关注观察到的数据,而不是预测
(看不见的)生化特性。目标是获得实验性的假设生成观测结果,
跟进目标1将创建先进的工具,用于表征相关的残基模式,由于功能
发散,每个模式由任意数量的残基组成。Aim 2将开发一个工具,
概率性地评估独立序列和结构定义的残基集合之间的相关性。
该工具将被修改用于其他目的,包括评估FRR预测程序。目标3将
将目标1和2方法和直接耦合分析(DCA)集成到一个近乎全面系统中,
序列/结构相关分析(与目标1和2下的相关性不同,DCA侧重于直接
残基对之间的相关性)。该策略涉及高度的模型复杂性和优化
协同地(由于相互关系和依赖性)和
替代模型和参数;因此,需要相当谨慎,以确保可靠的结果。因此,我们认为,
我们将应用信息理论的原则,以准确地调整多个假设,以避免不足和
过度拟合数据,并消除固有的偏差。目标3还将描述以下方面之间的关系:
各种类型的相关性。我们将把这些工具应用于大型的,功能多样的超家族,
与对这些蛋白质感兴趣的研究人员合作。使用在Aim 2和数百下开发的工具
目标4将严格基准测试根据以下标准开发的工具的性能:
目标1和3相对于竞争方法。该项目将有助于蛋白质工程的研究,
人类疾病的分子基础,药物设计和个性化医疗。
项目成果
期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach.
- DOI:10.1186/s12859-021-04108-5
- 发表时间:2021-04-15
- 期刊:
- 影响因子:3
- 作者:Chen X;Shi X;Neuwald AF;Hilakivi-Clarke L;Clarke R;Xuan J
- 通讯作者:Xuan J
Identifying intracellular signaling modules and exploring pathways associated with breast cancer recurrence.
- DOI:10.1038/s41598-020-79603-5
- 发表时间:2021-01-11
- 期刊:
- 影响因子:4.6
- 作者:Chen X;Gu J;Neuwald AF;Hilakivi-Clarke L;Clarke R;Xuan J
- 通讯作者:Xuan J
Identifying Function Determining Residues in Neuroimmune Semaphorin 4A.
- DOI:10.3390/ijms23063024
- 发表时间:2022-03-11
- 期刊:
- 影响因子:5.6
- 作者:Chapoval SP;Lee M;Lemmer A;Ajayi O;Qi X;Neuwald AF;Keegan AD
- 通讯作者:Keegan AD
IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles.
IntAPT:来自多个 RNA-seq 配置文件的表型特异性转录本的集成组装。
- DOI:10.1093/bioinformatics/btaa852
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Shi,Xu;Neuwald,AndrewF;Wang,Xiao;Wang,Tian-Li;Hilakivi-Clarke,Leena;Clarke,Robert;Xuan,Jianhua
- 通讯作者:Xuan,Jianhua
SPARC: Structural properties associated with residue constraints.
- DOI:10.1016/j.csbj.2022.04.005
- 发表时间:2022
- 期刊:
- 影响因子:6
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ANDREW F NEUWALD其他文献
ANDREW F NEUWALD的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ANDREW F NEUWALD', 18)}}的其他基金
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7258353 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7651998 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7471672 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7683169 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7138450 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Predicting common protein mechanisms by the light of evolution
从进化的角度预测常见的蛋白质机制
- 批准号:
7497470 - 财政年份:2006
- 资助金额:
$ 30.9万 - 项目类别:
Advanced Sequence-Based Prediction of Protein Function
先进的基于序列的蛋白质功能预测
- 批准号:
6796747 - 财政年份:1998
- 资助金额:
$ 30.9万 - 项目类别:
Advanced Sequence-Based Prediction of Protein Function
先进的基于序列的蛋白质功能预测
- 批准号:
6559851 - 财政年份:1998
- 资助金额:
$ 30.9万 - 项目类别:
Advanced Sequence-Based Prediction of Protein Function
先进的基于序列的蛋白质功能预测
- 批准号:
6584899 - 财政年份:1998
- 资助金额:
$ 30.9万 - 项目类别:
Advanced Sequence-Based Prediction of Protein Function
先进的基于序列的蛋白质功能预测
- 批准号:
6650870 - 财政年份:1998
- 资助金额:
$ 30.9万 - 项目类别:
相似海外基金
Quantum chemical challenge to elucidate the functional mechanism of base sequence specificity deciding removal of the DNA damage
量子化学挑战阐明碱基序列特异性决定去除 DNA 损伤的功能机制
- 批准号:
19K22903 - 财政年份:2019
- 资助金额:
$ 30.9万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Theoretical Study on Relation of Base sequence and Electronic Structures toward Elucidation of Mechanism of DNA Electric Conductivity.
碱基序列与电子结构关系的理论研究,阐明DNA导电机制。
- 批准号:
16K05666 - 财政年份:2016
- 资助金额:
$ 30.9万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Prediction and control of base sequence recognition ability for nucleic acid binding proteins by using computer experiments.
利用计算机实验预测和控制核酸结合蛋白的碱基序列识别能力。
- 批准号:
14598001 - 财政年份:2002
- 资助金额:
$ 30.9万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
FLANKING BASE SEQUENCE ON MUTAGENICITY OF 8 OXOGUANINE
8 氧鸟嘌呤致突变性的侧翼碱基序列
- 批准号:
6362773 - 财政年份:2001
- 资助金额:
$ 30.9万 - 项目类别:
FLANKING BASE SEQUENCE ON MUTAGENICITY OF 8 OXOGUANINE
8 氧鸟嘌呤致突变性的侧翼碱基序列
- 批准号:
6137753 - 财政年份:2000
- 资助金额:
$ 30.9万 - 项目类别:
GROWTH HOROMON LOCALIZATION AND ITS BASE SEQUENCE IN BOVINE PANCREATIC
牛胰腺生长激素定位及其碱基序列
- 批准号:
10460134 - 财政年份:1998
- 资助金额:
$ 30.9万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
2488608 - 财政年份:1997
- 资助金额:
$ 30.9万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6475917 - 财政年份:1997
- 资助金额:
$ 30.9万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6329024 - 财政年份:1997
- 资助金额:
$ 30.9万 - 项目类别:
DNA BASE SEQUENCE EFFECTS IN CHEMICAL CARCINOGENESIS
DNA 碱基序列在化学致癌作用中的作用
- 批准号:
6124462 - 财政年份:1997
- 资助金额:
$ 30.9万 - 项目类别: