Multi-view self-supervised deep learning for biological sequences and beyond

针对生物序列及其他领域的多视图自监督深度学习

基本信息

  • 批准号:
    10623063
  • 负责人:
  • 金额:
    $ 39.13万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-05-01 至 2028-07-31
  • 项目状态:
    未结题

项目摘要

Project Abstract The breadth and depth of deep learning (DL) in solving fundamental biological problems have been demonstrated. DL-based approaches, such as AlphaFold2 for 3D protein structure prediction, have become widely accepted by the biology community. The Xu lab has been at the forefront of developing novel DL algorithms, software, and information systems for diverse biological and medical problems. During the current project period, the Xu lab has made excellent progress in addressing some of the urgent challenges and needs for developing DL methods in biological sequence analyses and predictions, as well as other bioinformatics problems. This R35 project has produced 31 papers covering research topics ranging from protein sequence- based predictions to drug design, molecular dynamics simulation, and single-cell data analysis. In addition, it also provided more than ten open-source tools and three major web-based resources to the community. The rapid development of new DL techniques and Xu lab’s accumulating expertise in this field bring new opportunities in shaping DL to molecular biology. The current widely used supervised DL methods in biomedical research often do not have sufficient data with clean and accurate labels for training and may not have good generalizability. The emerging self-supervised learning (SSL) approaches that aim to learn informative representations by exposing relationships between different data perspectives without human annotations are becoming a new trend. Different data perspectives are broadly called multiview. The multi-view SSL techniques allow us to generate joint or coordinated representations for single modal and multimodal data with stronger generalizability, better robustness, and less bias. Though SSL has demonstrated great successes in other fields, it has only been minimally explored in biology. This renewal project will develop a multi-view SSL framework that can handle both single-view and multi- view data and is capable of single and multiple tasks. It will tackle key challenges and bottlenecks in applying SSL for biological studies, such as selecting effective views and data augmentations, fusing multimodal data or data from heterogeneous sources, and integrating biological constraints into SSL models. We will focus on designing a biology-informed system, enhancing generalizability and robustness, and making the results biologically interpretable and confidence assessable. The Xu lab will apply and refine the framework to multiple mainstream biology applications, including anti-CRISPR protein prediction, by exploring various data augmentation methods for protein sequences, ion and small ligand binding prediction using complementary views of protein sequences and structures, and single-cell data analyses across different conditions. The framework will also be tested for broad applications in sequence-based studies and beyond, such as alignment- free methods for constructing phylogenetic trees and detecting novel protein families, as well as conducting cross-species single-cell data analysis.
项目摘要

项目成果

期刊论文数量(22)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM.
DeepDom:使用堆叠双向 LSTM 仅根据序列预测蛋白质域边界。
G2PDeep: a web-based deep-learning framework for quantitative phenotype prediction and discovery of genomic markers.
  • DOI:
    10.1093/nar/gkab407
  • 发表时间:
    2021-07-02
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Zeng S;Mao Z;Ren Y;Wang D;Xu D;Joshi T
  • 通讯作者:
    Joshi T
Prediction of Protein Ion-Ligand Binding Sites with ELECTRA.
  • DOI:
    10.3390/molecules28196793
  • 发表时间:
    2023-09-25
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Essien C;Jiang L;Wang D;Xu D
  • 通讯作者:
    Xu D
Sampling and ranking spatial transcriptomics data embeddings to identify tissue architecture.
  • DOI:
    10.3389/fgene.2022.912813
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
  • 通讯作者:
Large-Scale Integrative Analysis of Soybean Transcriptome Using an Unsupervised Autoencoder Model.
  • DOI:
    10.3389/fpls.2022.831204
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    5.6
  • 作者:
    Su L;Xu C;Zeng S;Su L;Joshi T;Stacey G;Xu D
  • 通讯作者:
    Xu D
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

DONG XU其他文献

DONG XU的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('DONG XU', 18)}}的其他基金

Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
  • 批准号:
    10395451
  • 财政年份:
    2018
  • 资助金额:
    $ 39.13万
  • 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
  • 批准号:
    9925232
  • 财政年份:
    2018
  • 资助金额:
    $ 39.13万
  • 项目类别:
Deep learning for protein subcellular/sub-organelle localizations and localization motifs
蛋白质亚细胞/亚细胞器定位和定位基序的深度学习
  • 批准号:
    9768571
  • 财政年份:
    2018
  • 资助金额:
    $ 39.13万
  • 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
  • 批准号:
    10409152
  • 财政年份:
    2018
  • 资助金额:
    $ 39.13万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8656715
  • 财政年份:
    2012
  • 资助金额:
    $ 39.13万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8258610
  • 财政年份:
    2012
  • 资助金额:
    $ 39.13万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8469528
  • 财政年份:
    2012
  • 资助金额:
    $ 39.13万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    9086384
  • 财政年份:
    2012
  • 资助金额:
    $ 39.13万
  • 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
  • 批准号:
    7648313
  • 财政年份:
    2006
  • 资助金额:
    $ 39.13万
  • 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
  • 批准号:
    7267931
  • 财政年份:
    2006
  • 资助金额:
    $ 39.13万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 39.13万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了