Deep learning for protein subcellular/sub-organelle localizations and localization motifs
蛋白质亚细胞/亚细胞器定位和定位基序的深度学习
基本信息
- 批准号:9768571
- 负责人:
- 金额:$ 20.53万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAmino Acid SequenceArchitectureAttentionBase SequenceBioinformaticsCellsClassificationComputer softwareComputing MethodologiesDataData SourcesDetectionDevelopmentDiseaseEngineeringEnzymesEukaryotic CellExperimental DesignsGenerationsHybridsImageryLabelMachine LearningMetabolic DiseasesMethodologyMethodsMitochondriaModelingN-terminalNeural Network SimulationNobel PrizeOrganellesPatternPeptide Signal SequencesPeptidesPlantsProtein FamilyProtein translocationProteinsProteomePublic HealthResearchResearch PersonnelResolutionSeriesSignal TransductionSupervisionTechniquesTechnologyTrainingUbiquitinationWorkbasebioinformatics toolcomputerized toolsconvolutional neural networkdeep learningdesignimprovedinnovationinterestlearning networklearning strategynovelonline resourcepredictive modelingprotein functionrecurrent neural networksuccesstherapy designtool
项目摘要
Project Summary
Eukaryotic cells have diverse cellular components, including subcellular organelles and sub-organelle
compartments. The accurate targeting of proteins to these cellular components is crucial in establishing and
maintaining cellular organizations and functions. Mis-localization of proteins is often associated with metabolic
disorders and diseases. However, the vast majority of proteins lack subcellular/sub-organelle localization
annotation. Compared with experimental methods, computational prediction of protein localization provides an
efficient and effective way for proteome annotation and experimental design. The current prediction tools for
protein localization have significant room for improvement. In addition, no tool can predict localization at the sub-
organelle resolution or internal localization signals. Deep learning, as the cutting-edge technology in machine
learning, presents a new opportunity for this classical bioinformatics problem. The availability of recent high-
throughput localization data can also train deep learning well. The PI’s lab has demonstrated some success on
a special case, i.e., predicting mitochondrial localizations for plants using deep learning.
In this project, the PI proposes to develop new methods and a standalone toolkit for accurate and scalable
protein localization prediction at the subcellular and sub-organelle levels, as well as for characterization of
localization motifs (including novel internal motifs). The general approach is to design a semi-supervised deep-
learning method that utilizes both annotated protein sequences with known localization and unannotated protein
sequences as training data. Through the realization of an unsupervised deep-learning approach, a general
representation of protein sequences will be implemented, characterizing both local and global features of protein
sequences. By visualizing and characterizing the deep-learning models, novel, interpretable protein sequence
patterns will be predicted as putative targeting peptides and compared with known localization signals. We will
also use the methods to be developed and the unsupervised models to be trained on all protein sequences as a
general framework for other sequence-based prediction problems that predict the label of a protein and the key
residues contributing to the label. We will make the platform highly customizable and apply it to three
applications, including ubiquitination protein prediction, enzyme EC number prediction, and protein
family/subfamily classification. The innovative contributions to protein sequence-based analyses and predictions
include: (1) using raw amino acid sequences as training inputs without feature engineering; (2) utilizing the huge
amount of unannotated data in an unsupervised deep learning to characterize a general protein feature
representation; (3) identifying potential targeting signals (especially internal motifs) by decoding the trained deep-
learning models, augmented with sophisticated attention mechanisms; 4) detecting multiple-organelle targeting
and sub-organelle localizations by a novel hierarchical multi-label architecture; and (5) combining features from
different data sources by a multiplicative fused CNN model.
项目摘要
真核细胞具有多种细胞成分,包括亚细胞器和亚细胞器。
车厢。蛋白质对这些细胞成分的准确靶向对于建立和
维护蜂窝组织和功能。蛋白质的错误定位常常与代谢有关。
紊乱和疾病。然而,绝大多数蛋白质缺乏亚细胞/亚细胞器定位。
注释。与实验方法相比,蛋白质定位的计算预测提供了一个
一种高效的蛋白质组注释和实验设计方法。当前的预测工具用于
蛋白质定位有很大的改进空间。此外,没有任何工具可以预测本地化在
细胞器分辨率或内部定位信号。深度学习,作为机器的前沿技术
学习,为这个经典的生物信息学问题提供了一个新的机会。最近的高可获得性
吞吐量本地化数据也可以很好地训练深度学习。PI的实验室已经证明了一些成功的
一种特殊情况,即使用深度学习预测植物线粒体的定位。
在这个项目中,PI建议开发新的方法和独立的工具包,以实现精确度和可扩展性
在亚细胞和亚细胞器水平上的蛋白质定位预测,以及对
本土化主题(包括新奇的内在主题)。一般的方法是设计一个半监督的深井-
利用已知定位的已注释蛋白质序列和未注释蛋白质序列的学习方法
序列作为训练数据。通过实现无监督的深度学习方法,一般
将实现蛋白质序列的表示,表征蛋白质的局部和全局特征
序列。通过可视化和表征深度学习模型、新颖的、可解释的蛋白质序列
模式将被预测为假定的靶向多肽,并与已知的定位信号进行比较。我们会
也使用要开发的方法和要对所有蛋白质序列进行训练的无监督模型作为
其他基于序列的预测问题的通用框架,用于预测蛋白质的标签和关键字
构成标签的残留物。我们将使该平台高度可定制,并将其应用于三个
应用,包括泛素化蛋白质预测、酶EC数预测和蛋白质
族/子族分类。基于蛋白质序列的分析和预测的创新贡献
包括:(1)使用未经特征工程的氨基酸原始序列作为训练输入;(2)利用海量的
用于表征一般蛋白质特征的无监督深度学习中的未注释数据量
识别潜在的目标信号(特别是内部基序),方法是对训练好的深层基序进行解码。
学习模型,增强了复杂的注意机制;4)检测多细胞器靶向
和子细胞器局部化;以及(5)结合来自
不同的数据源通过一个乘性融合的CNN模型。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Computational methods for protein localization prediction.
- DOI:10.1016/j.csbj.2021.10.023
- 发表时间:2021
- 期刊:
- 影响因子:6
- 作者:Jiang Y;Wang D;Wang W;Xu D
- 通讯作者:Xu D
MULocDeep: A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation.
- DOI:10.1016/j.csbj.2021.08.027
- 发表时间:2021
- 期刊:
- 影响因子:6
- 作者:Jiang Y;Wang D;Yao Y;Eubel H;Künzler P;Møller IM;Xu D
- 通讯作者:Xu D
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
DONG XU其他文献
DONG XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('DONG XU', 18)}}的其他基金
Multi-view self-supervised deep learning for biological sequences and beyond
针对生物序列及其他领域的多视图自监督深度学习
- 批准号:
10623063 - 财政年份:2018
- 资助金额:
$ 20.53万 - 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
- 批准号:
10395451 - 财政年份:2018
- 资助金额:
$ 20.53万 - 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
- 批准号:
9925232 - 财政年份:2018
- 资助金额:
$ 20.53万 - 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
- 批准号:
10409152 - 财政年份:2018
- 资助金额:
$ 20.53万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8656715 - 财政年份:2012
- 资助金额:
$ 20.53万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8258610 - 财政年份:2012
- 资助金额:
$ 20.53万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8469528 - 财政年份:2012
- 资助金额:
$ 20.53万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
9086384 - 财政年份:2012
- 资助金额:
$ 20.53万 - 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
- 批准号:
7648313 - 财政年份:2006
- 资助金额:
$ 20.53万 - 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
- 批准号:
7267931 - 财政年份:2006
- 资助金额:
$ 20.53万 - 项目类别:
相似海外基金
Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
- 批准号:
23K06972 - 财政年份:2023
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
- 批准号:
23KJ0939 - 财政年份:2023
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
- 批准号:
19K07013 - 财政年份:2019
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
- 批准号:
16K05820 - 财政年份:2016
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
- 批准号:
16K08426 - 财政年份:2016
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
- 批准号:
25461010 - 财政年份:2013
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
- 批准号:
23654177 - 财政年份:2011
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
- 批准号:
22550105 - 财政年份:2010
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
- 批准号:
NE/D00232X/1 - 财政年份:2006
- 资助金额:
$ 20.53万 - 项目类别:
Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
- 批准号:
16500189 - 财政年份:2004
- 资助金额:
$ 20.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)