Interpretable and extendable deep learning model for biological sequence analysis and prediction

用于生物序列分析和预测的可解释和可扩展的深度学习模型

基本信息

  • 批准号:
    10409152
  • 负责人:
  • 金额:
    $ 23.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-05-01 至 2023-04-30
  • 项目状态:
    已结题

项目摘要

SUMMARY Single-cell sequencing technologies provide great opportunities for studying biology and medicine, but computational analyses are often the bottlenecks to reveal biological insights and define cellular heterogeneity underlying the data. The applications of machine learning (ML), especially deep learning hold great promises to address the challenges. While ML studies from various labs, including the PI’s lab, have made significant progress along this line, the involvement of the ML community in single-cell data analysis is limited due to the barriers of technology complexity and biology knowledge. To attract more ML experts into this field, the PI proposes to make large-scale single-cell sequencing data ML-ready and provide an ML-friendly development environment. Specific aims include: (1) Collect, process, and manage diverse single-cell sequencing data to make them ML-ready. We will collect single-cell sequencing data from public sources and convert them into formats efficient for storage and handling. The data will be processed with multiple options, such as imputation, normalization, and dimension reduction using a pipeline to be developed. (2) Configure the data into benchmarks. We will use the collected data to build benchmarks, gather public benchmarks, and encourage the community to submit their benchmarks. The data will be divided into training, validation, and test sets in multiple settings, including a minimum viable benchmark to assist efficient method development and a comprehensive benchmark for full evaluations. We will develop utilities to evaluate results based on a set of assessment measures, and generate detailed reports. We will select a set of public tools to run them on the benchmarks as baselines for others to compare with. (3) Provide an integrated development environment (IDE) to support partial method development. We will build an IDE for single-cell sequencing analysis method development with plug-and-play features at the code level and web interface for ML researchers to contribute and test any minimum new ideas. A report will be provided containing evaluation metrics and usage of computer resources, comparisons with some public tools, and downstream visualization and interpretation. The newly formatted data, the benchmarks, and the method development and assessment environment will be available at GitHub and the in-house single-cell data analysis web portal DeepMAPS. The proposed research is a natural extension of the parent grant (R35-GM126985), which aims to develop deep- learning algorithms, tools, web resources for analyses and predictions of biological sequences, including (1) developing general unsupervised representations and making deep-learning models interpretable for understanding biological mechanisms and generating hypotheses; (2) applying deep-learning models to a wide range of bioinformatics problems, and (3) making the data, models, and tools freely accessible to the research community. Thanks to the flexibility of the R35 mechanism, the PI’s lab extended these methods to single-cell data analyses, which well-prepared the lab for the proposed tasks.
总结 单细胞测序技术为研究生物学和医学提供了巨大的机会, 计算分析通常是揭示生物学见解和定义细胞异质性的瓶颈 数据的基础。机器学习(ML)的应用,特别是深度学习,具有很大的前景 来应对挑战。虽然来自各种实验室的ML研究,包括PI的实验室, 虽然ML社区沿着这条路线取得了进展,但由于 技术复杂性和生物学知识的障碍。为了吸引更多的ML专家进入这个领域,PI 建议使大规模单细胞测序数据ML就绪,并提供ML友好的开发 环境具体目标包括:(1)收集、处理和管理多样化的单细胞测序数据 让他们为ML做好准备。我们将从公共来源收集单细胞测序数据并进行转换 转换成便于储存和处理的格式。将使用多个选项处理数据,例如 使用待开发的流水线进行插补、归一化和降维。(2)配置数据 变成基准。我们将使用收集的数据来建立基准,收集公共基准, 鼓励社区提交基准。数据将分为训练、验证和 多种环境下的测试集,包括最低可行基准,以帮助有效的方法开发 和全面评价的综合基准。我们将开发实用程序来评估基于 一套评估措施,并生成详细的报告。我们将选择一组公共工具来运行它们 基准作为其他人比较的基准。(3)提供综合发展 开发环境(IDE)支持部分方法开发。我们将为单细胞测序构建一个IDE 分析方法开发,在代码级和ML的Web界面上具有即插即用功能 研究人员贡献和测试任何最低限度的新想法。将提供一份包含评价的报告 计算机资源的度量和使用,与一些公共工具的比较,以及下游可视化 和解释。新格式化的数据、基准以及方法开发和评估 该环境将在GitHub和内部单细胞数据分析门户网站DeepMAPS上提供。的 拟议的研究是母基金(R35-GM 126985)的自然延伸,其目的是深入开发 学习算法,工具,网络资源的分析和预测的生物序列,包括(1) 开发一般的无监督表示,并使深度学习模型可解释为 理解生物机制并产生假设;(2)将深度学习模型应用于广泛的 一系列生物信息学问题,以及(3)使数据,模型和工具免费提供给研究 社区由于R35机制的灵活性,PI的实验室将这些方法扩展到单细胞 数据分析,为实验室完成拟议任务做好了充分准备。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

DONG XU其他文献

DONG XU的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('DONG XU', 18)}}的其他基金

Multi-view self-supervised deep learning for biological sequences and beyond
针对生物序列及其他领域的多视图自监督深度学习
  • 批准号:
    10623063
  • 财政年份:
    2018
  • 资助金额:
    $ 23.48万
  • 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
  • 批准号:
    10395451
  • 财政年份:
    2018
  • 资助金额:
    $ 23.48万
  • 项目类别:
Interpretable and extendable deep learning model for biological sequence analysis and prediction
用于生物序列分析和预测的可解释和可扩展的深度学习模型
  • 批准号:
    9925232
  • 财政年份:
    2018
  • 资助金额:
    $ 23.48万
  • 项目类别:
Deep learning for protein subcellular/sub-organelle localizations and localization motifs
蛋白质亚细胞/亚细胞器定位和定位基序的深度学习
  • 批准号:
    9768571
  • 财政年份:
    2018
  • 资助金额:
    $ 23.48万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8656715
  • 财政年份:
    2012
  • 资助金额:
    $ 23.48万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8258610
  • 财政年份:
    2012
  • 资助金额:
    $ 23.48万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    8469528
  • 财政年份:
    2012
  • 资助金额:
    $ 23.48万
  • 项目类别:
Development of MUFOLD for Building High-Accuracy Protein Structure Models
开发用于建立高精度蛋白质结构模型的 MUFOLD
  • 批准号:
    9086384
  • 财政年份:
    2012
  • 资助金额:
    $ 23.48万
  • 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
  • 批准号:
    7648313
  • 财政年份:
    2006
  • 资助金额:
    $ 23.48万
  • 项目类别:
New Scoring, Assembly and Evaulation Techiniques for Protein Structure Prediction
用于蛋白质结构预测的新评分、组装和评估技术
  • 批准号:
    7267931
  • 财政年份:
    2006
  • 资助金额:
    $ 23.48万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 23.48万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了