权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Advanced correlation analyses to infer sequence and structural determinants of protein function

先进的相关性分析可推断蛋白质功能的序列和结构决定因素

基本信息

批准号：
10093067
负责人：
ANDREW F NEUWALD
金额：
$ 30.9万
依托单位：
UNIVERSITY OF MARYLAND BALTIMORE
依托单位国家：
美国
项目类别：
财政年份：
2018
资助国家：
美国
起止时间：
2018-02-01 至 2023-01-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10093067
关键词：
Acetyltransferase Base Sequence Bayesian Analysis Benchmarking Biochemical Biological Biomedical Research Caring Catalysis Code Collaborations Common Core Correlation Studies Coupling DNA Repair Endonuclease Data Data Set Databases Dependence Dimerization Drug Design Ensure Evaluation Feedback Formulation Goals Guanosine Triphosphate Phosphohydrolases Health Human Hydrogen Bonding Individual Investigation Joints Link Measures Mediating Methods Mind Modeling Molecular Molecular Biology Pattern Performance Phosphoric Monoester Hydrolases Positioning Attribute Process Property Protein Engineering Proteins Quality Control Reliability of Results Research Research Personnel Role Sampling Sensitivity and Specificity Sequence Alignment Specificity Speed Statistical Models Structure Subgroup System Testing Time Validation base follow-up human disease improved innovation inositol-1,4,5-trisphosphate 5-phosphatase insight interest member open source personalized medicine programs protein function three dimensional structure tool

项目摘要

PROJECT SUMMARY A long-term goal of molecular biology is assigning functional and mechanistic roles to specific protein residues, beyond the obvious roles in catalysis. Although this task is hindered by the relative sparsity of experimentally- based sequence annotations, it is facilitated by an abundance of sequence data augmented by structural data. This has spurred sequence- and structure-based prediction of function determining residues using a wide variety of methods. However, by focusing on experimentally characterized functions, these methods disfavor recognition of residues involved in important uncharacterized functions, insofar as these will be benchmarked incorrectly as false positives. Instead, this project focuses more generally on inferring functionally-relevant residues (FRRs) by allowing the sequence data itself to reveal its most statistically surprising properties without making assumptions about what will be found. We argue that, in the absence of experimental annotations, it is only possible to directly link individual residues to other residues and such residue sets to structural features. This project will make such associations by identifying sequence-to-sequence and sequence-to-structure correlations, and will focus solely on the observed data rather than on predicting (unseen) biochemical properties. The goal is to obtain hypothesis-generating observations for experimental follow up. Aim 1 will create advanced tools for characterizing correlated residue patterns due to functional divergence with each pattern consisting of an arbitrary number of residues. Aim 2 will develop a tool to probabilistically assess correlations between independent sequence- and structurally-defined residue sets. This tool will be modified for other purposes, including the evaluation of FRR-prediction programs. Aim 3 will integrate Aims 1 & 2 methods and direct coupling analysis (DCA) into a nearly comprehensive system for sequence/structural correlation analysis. (Unlike the correlations under Aims 1 & 2, DCA focuses on direct correlations between residue pairs.) This strategy involves a high degree of model complexity and optimization over diverse sequence properties synergistically (due to interrelationships and dependencies) and over alternative models and parameters; hence, considerable care is required to ensure reliable results. Therefore, we will apply information theoretical principles to adjust accurately for multiple hypotheses, to avoid under- and over-fitting to the data, and to eliminate inherent biases. Aim 3 will also characterize the relationships among the various types of correlations. We will apply these tools to large, functionally diverse superfamilies in collaboration with researchers interested in these proteins. Using tools developed under Aim 2 and hundreds of conserved domain datasets, Aim 4 will rigorously benchmark the performance of tools developed under Aims 1 & 3 relative to competing methods. This project will aid research efforts in protein engineering, the molecular basis of human disease, drug design and personalized medicine.

项目摘要分子生物学的一个长期目标是为特定的蛋白质残基分配功能和机制作用，除了催化作用之外虽然这项任务受到实验相对稀疏的阻碍- 基于序列注释，通过结构数据扩充的丰富序列数据促进了该方法。这刺激了基于序列和结构的预测功能决定残基使用广泛的多种方法。然而，由于专注于实验表征的函数，这些方法不利于识别涉及重要的未表征功能的残基，只要这些功能将被基准化误认为是假阳性。相反，这个项目更普遍地侧重于推断功能相关通过允许序列数据本身揭示其最令人惊讶的统计特性，我们认为，在没有实验证据的情况下，注释时，仅可能将单个残基直接连接到其他残基，并且这样的残基组结构特征本项目将通过识别序列间的关联，序列与结构的相关性，并将只关注观察到的数据，而不是预测（看不见的）生化特性。目标是获得实验性的假设生成观测结果，跟进目标1将创建先进的工具，用于表征相关的残基模式，由于功能发散，每个模式由任意数量的残基组成。Aim 2将开发一个工具，概率性地评估独立序列和结构定义的残基集合之间的相关性。该工具将被修改用于其他目的，包括评估FRR预测程序。目标3将将目标1和2方法和直接耦合分析（DCA）集成到一个近乎全面系统中，序列/结构相关分析（与目标1和2下的相关性不同，DCA侧重于直接残基对之间的相关性）。该策略涉及高度的模型复杂性和优化协同地（由于相互关系和依赖性）和替代模型和参数;因此，需要相当谨慎，以确保可靠的结果。因此，我们认为，我们将应用信息理论的原则，以准确地调整多个假设，以避免不足和过度拟合数据，并消除固有的偏差。目标3还将描述以下方面之间的关系：各种类型的相关性。我们将把这些工具应用于大型的，功能多样的超家族，与对这些蛋白质感兴趣的研究人员合作。使用在Aim 2和数百下开发的工具目标4将严格基准测试根据以下标准开发的工具的性能：目标1和3相对于竞争方法。该项目将有助于蛋白质工程的研究，人类疾病的分子基础，药物设计和个性化医疗。

项目成果

期刊论文数量（15）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach.

DOI：
10.1186/s12859-021-04108-5
发表时间：
2021-04-15
期刊：
BMC bioinformatics
影响因子：
3
作者：
Chen X;Shi X;Neuwald AF;Hilakivi-Clarke L;Clarke R;Xuan J
通讯作者：
Xuan J

Identifying intracellular signaling modules and exploring pathways associated with breast cancer recurrence.

DOI：
10.1038/s41598-020-79603-5
发表时间：
2021-01-11
期刊：
Scientific reports
影响因子：
4.6
作者：
Chen X;Gu J;Neuwald AF;Hilakivi-Clarke L;Clarke R;Xuan J
通讯作者：
Xuan J

Identifying Function Determining Residues in Neuroimmune Semaphorin 4A.

DOI：
10.3390/ijms23063024
发表时间：
2022-03-11
期刊：
International journal of molecular sciences
影响因子：
5.6
作者：
Chapoval SP;Lee M;Lemmer A;Ajayi O;Qi X;Neuwald AF;Keegan AD
通讯作者：
Keegan AD

IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles.

IntAPT：来自多个 RNA-seq 配置文件的表型特异性转录本的集成组装。

DOI：
10.1093/bioinformatics/btaa852
发表时间：
2021
期刊：
Bioinformatics (Oxford, England)
影响因子：
0
作者：
Shi,Xu;Neuwald,AndrewF;Wang,Xiao;Wang,Tian-Li;Hilakivi-Clarke,Leena;Clarke,Robert;Xuan,Jianhua
通讯作者：
Xuan,Jianhua

SPARC: Structural properties associated with residue constraints.