Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
基本信息
- 批准号:10395451
- 负责人:
- 金额:$ 45.64万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-05-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:Algorithmic SoftwareAmino Acid SequenceAreaBase SequenceBig DataBioinformaticsBiologicalBiological ModelsBiologyBiomedical ResearchCommunitiesComputational BiologyComputational algorithmDNADNA SequenceDataData AnalysesDevelopmentGenotypeGoalsHealthcareInformation SystemsKnowledgeLabelLearningLightMachine LearningMalignant NeoplasmsMedicalMedicineMethodsMicrobeModelingMutationMutation AnalysisPaperPerformancePhenotypePlantsPlug-inPost-Translational Protein ProcessingPropertyProteinsPublic HealthPublishingRNARNA SequencesResearchResource InformaticsSequence AnalysisSeriesSourceSystemTechnologyWorkcomputerized toolsdeep learningdeep learning algorithmdeep learning modeldesigndrug developmentimprovedin silicoindexingintegrated circuitlearning strategymachine learning methodmobile applicationnovelonline resourceopen sourcepersonalized diagnosticspersonalized medicineprecision medicineprotein structure functionprotein structure predictionsoftware systemssupervised learningsynthetic biologytoolunsupervised learning
项目摘要
Project Abstract
Bioinformatics and computational biology have become the core of biomedical research. The PI Dr. Dong Xu's
work in this area focuses on development of novel computational algorithms, software and information
systems, as well as on broad applications of these tools and other informatics resources for diverse biological
and medical problems. He works on many research problems in protein structure prediction, post-translational
modification prediction, high-throughput biological data analyses, in silico studies of plants, microbes and
cancers, biological information systems, and mobile App development for healthcare. He has published more
than 300 papers, with about 12,000 citations and H-index of 55. In this project, the PI proposes to develop
deep-learning algorithms, tools, web resources for analyses and predictions of biological sequences, including
DNA, RNA, and protein sequences. The availability of these data provides emerging opportunities for precision
medicine and other areas, while deep learning as a cutting-edge technology in machine learning, presents a
new powerful method for analyses and predictions of biological sequences. With rapidly accumulating
sequence data and fast development of deep-learning methods, there is an urgent need to systematically
investigate how to best apply deep learning in sequence analyses and predictions. For this purpose, the PI will
develop cutting-edge deep-learning methods with the following goals for the next five years:
(1) Develop a series of novel deep-learning methods and models to specifically target biological
sequence analyses and predictions in: (a) general unsupervised representations of DNA/RNA, protein and
SNP/mutation sequences that capture both local and global features for various applications; (b) methods to
make deep-learning models interpretable for understanding biological mechanisms and generating
hypotheses; (c) “rule learning”, which abstracts the underlying “rules” by combining unsupervised learning of
large unlabeled data and supervised learning of small labeled data so that it can classify new unlabeled data.
(2) Apply the proposed deep-learning model to DNA/RNA sequence annotation, genotype-phenotype
analyses, cancer mutation analyses, protein function/structure prediction, protein localization prediction, and
protein post-translational modification prediction. The PI will exploit particular properties associated with each
of these problems to improve the deep-learning models. He will develop a set of related prediction and analysis
tools, which will improve the state-of-art performance and shed some light on related biological mechanisms.
(3) Make the data, models, and tools freely accessible to the research community. The system will be
designed modular and open-source, available through GitHub. They will be available like integrated circuit
modules, which are universal and ready to plug in for different applications. The PI will develop a web resource
for biological sequence representations, analyses, and predictions, as well as tutorials to help biologists with
no computational knowledge to apply deep learning to their specific research problems.
项目摘要
生物信息学和计算生物学已成为生物医学研究的核心。主要研究者徐东博士
这一领域的工作重点是开发新的计算算法、软件和信息
系统,以及这些工具和其他信息学资源在不同生物学领域的广泛应用。
和医疗问题。他致力于蛋白质结构预测、翻译后
修饰预测,高通量生物数据分析,植物,微生物和
癌症、生物信息系统和医疗保健领域的移动的应用程序开发。他出版了更多
300多篇论文,约12,000次引用,H指数为55。在本项目中,PI建议开发
用于分析和预测生物序列的深度学习算法、工具和网络资源,包括
DNA、RNA和蛋白质序列。这些数据的可用性为精确性提供了新的机会
医学和其他领域,而深度学习作为机器学习的前沿技术,
分析和预测生物序列的新的强大方法。随着快速积累
序列数据和深度学习方法的快速发展,迫切需要系统地
研究如何在序列分析和预测中最好地应用深度学习。为此,PI将
开发尖端的深度学习方法,未来五年的目标如下:
(1)开发一系列新颖的深度学习方法和模型,专门针对生物
序列分析和预测:(a)DNA/RNA、蛋白质和
(B)捕获用于各种应用的局部和全局特征的SNP/突变序列;
使深度学习模型具有可解释性,以理解生物机制并生成
(c)“规则学习”,通过将无监督学习与假设相结合来抽象潜在的“规则”,
大的未标记数据和小的标记数据的监督学习,以便它可以分类新的未标记数据。
(2)将所提出的深度学习模型应用于DNA/RNA序列注释、基因型-表型
分析、癌症突变分析、蛋白质功能/结构预测、蛋白质定位预测,以及
蛋白质翻译后修饰预测。PI将利用与每个项目相关的特定属性,
来改进深度学习模型。他将制定一套相关的预测和分析
工具,这将提高最先进的性能,并阐明相关的生物机制。
(3)使研究社区可以免费访问数据、模型和工具。该系统将
设计为模块化和开源,可通过GitHub获得。它们将像集成电路一样可用
这些模块是通用的,可以随时插入不同的应用程序。PI将开发一个网络资源
用于生物序列表示,分析和预测,以及帮助生物学家
没有计算知识来将深度学习应用于他们的特定研究问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
DONG XU其他文献
DONG XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('DONG XU', 18)}}的其他基金
Multi-view self-supervised deep learning for biological sequences and beyond
针对生物序列及其他领域的多视图自监督深度学习
- 批准号:
10623063 - 财政年份:2018
- 资助金额:
$ 45.64万 - 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
- 批准号:
9925232 - 财政年份:2018
- 资助金额:
$ 45.64万 - 项目类别:
Deep learning for protein subcellular/sub-organelle localizations and localization motifs
蛋白质亚细胞/亚细胞器定位和定位基序的深度学习
- 批准号:
9768571 - 财政年份:2018
- 资助金额:
$ 45.64万 - 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
- 批准号:
10409152 - 财政年份:2018
- 资助金额:
$ 45.64万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8656715 - 财政年份:2012
- 资助金额:
$ 45.64万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8258610 - 财政年份:2012
- 资助金额:
$ 45.64万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
8469528 - 财政年份:2012
- 资助金额:
$ 45.64万 - 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
- 批准号:
9086384 - 财政年份:2012
- 资助金额:
$ 45.64万 - 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
- 批准号:
7648313 - 财政年份:2006
- 资助金额:
$ 45.64万 - 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
- 批准号:
7267931 - 财政年份:2006
- 资助金额:
$ 45.64万 - 项目类别:
相似海外基金
Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
- 批准号:
23K06972 - 财政年份:2023
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
- 批准号:
23KJ0939 - 财政年份:2023
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
- 批准号:
19K07013 - 财政年份:2019
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
- 批准号:
16K05820 - 财政年份:2016
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
- 批准号:
16K08426 - 财政年份:2016
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
- 批准号:
25461010 - 财政年份:2013
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
- 批准号:
23654177 - 财政年份:2011
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
- 批准号:
22550105 - 财政年份:2010
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
- 批准号:
NE/D00232X/1 - 财政年份:2006
- 资助金额:
$ 45.64万 - 项目类别:
Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
- 批准号:
16500189 - 财政年份:2004
- 资助金额:
$ 45.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)