Manifold representations and active learning for 21 st century biology

21 世纪生物学的流形表示和主动学习

基本信息

  • 批准号:
    10401890
  • 负责人:
  • 金额:
    $ 35.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-06-01 至 2026-05-31
  • 项目状态:
    未结题

项目摘要

Project Summary With the rise of high-throughput sequencing and multiplexed biotechnologies enabling single-cell multi-omics and massively parallel CRISPR experiments, the biomedical community is generating a monumental amount of data. These data promise to reveal new biology and drive personal and precision medicine. However, the sheer volume of genomic data is overwhelming current computational resources, requiring prohibitively high compute time, memory usage, and storage. My lab has been at the forefront of solving big data challenges in genomics, designing novel algorithms that enable efficient and secure analyses that were previously computationally infeasible, and that reveal novel structural, cellular, and systems biology. Drawing upon our expertise in developing scalable and insightful algorithms for analyzing genomic, transcriptomic, and proteomic data, we aim to tackle two key data-driven challenges facing the biological community: 1) efficient, accurate, and robust characterization of tissues at the single-cell level, and 2) translating high-throughput datasets into biological discoveries via machine learning-based prediction. To solve the first challenge, we will leverage our discovery that seemingly high-dimensional sequencing data often lies on low-dimensional manifolds that capture the underlying biological state of interest. We will design algorithms that generate these compact, meaningful manifold representations of single-cell omics datasets. This will enable a number of key applications including characterizing co-expression and gene-modules that define healthy and pathologic cell states; integrating multi-modal single-cell omics datasets to more richly characterize cellular diversity; and investigating the mechanisms underlying transcriptomic diversity across tissues and developmental states. To solve the second challenge, we will take a two-pronged approach. First, we will design novel machine learning frameworks that provide a measure of confidence when predicting in unfamiliar biological states, enabling prediction that is robust to “out-of-distribution” (unobserved) examples. We will then work with our experimental collaborators and CROs to rapidly perform experimental validation of model-based predictions. Finally, we will return the experimental results to the model to further improve performance. This will enable an “active learning” feedback loop to efficiently explore a complex biological space for outcomes of interest. We will use this uncertainty-powered active learning approach to explore several pressing biological concerns such as the identification of small molecule compounds with enzymatic or whole-cell growth inhibitory properties, efficient design of spatial- transcriptomic experiments, computationally guided CRISPR perturbation experiments, and identification of functional non-coding mutations. This project will result in 1) numerous software tools with wide utility that efficiently analyze massive biological datasets and guide complex experimentation, and 2) reveal biological insights, especially into biomolecular interactions and cellular heterogeneity.
项目摘要 随着高通量测序和能够实现单细胞多组学的多路生物技术的兴起 和大规模平行的CRISPR实验,生物医学界正在产生大量的 数据。这些数据有望揭示新的生物学,并推动个性化和精准医学。然而,纯粹的 基因组数据量超过了当前的计算资源,需要极高的计算能力 时间、内存使用量和存储空间。我的实验室一直处于解决基因组学大数据挑战的前沿, 设计新的算法,使以前在计算上进行的高效和安全的分析成为可能 不可行,这揭示了新的结构、细胞和系统生物学。凭借我们在以下方面的专业知识 开发可扩展和有洞察力的算法来分析基因组、转录组和蛋白质组数据,我们的目标是 要应对生物界面临的两个关键的数据驱动挑战:1)高效、准确和强大 单细胞水平的组织特征,以及2)将高通量数据集转化为生物学 通过基于机器学习的预测进行发现。为了解决第一个挑战,我们将利用我们的发现 看似高维的测序数据通常位于低维流形上,这些流形捕获了 潜在的感兴趣的生物状态。我们将设计算法来生成这些紧凑、有意义的 单细胞组学数据集的多种表示形式。这将支持许多关键应用程序,包括 确定共表达和定义健康和病理细胞状态的基因模块的特征;整合 多模式单细胞组学数据集,以更丰富地表征细胞多样性;并研究 不同组织和发育状态下转录多样性的潜在机制。要解决第二个问题 面对挑战,我们将采取双管齐下的方法。首先,我们将设计新的机器学习框架, 在不熟悉的生物状态下进行预测时,提供一种信心度量,从而实现稳健的预测 到“分配不均”(未被观察到)的例子。然后,我们将与我们的实验合作者和CRO合作 以快速执行基于模型的预测的实验验证。最后,我们将退还实验 结果给模型带来了进一步的性能提升。这将使“主动学习”反馈循环能够 有效地探索复杂的生物空间,以获得感兴趣的结果。我们将利用这种不确定性驱动 主动学习的方法,探索几个紧迫的生物学问题,如识别小 具有酶或全细胞生长抑制特性的分子化合物,高效的空间设计- 转录实验,计算引导的CRISPR扰动实验,以及鉴定 功能性非编码突变。该项目将导致1)大量具有广泛实用性的软件工具 高效分析海量生物数据集,指导复杂实验;2)揭示生物 洞察力,特别是对生物分子相互作用和细胞异质性的见解。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

BONNIE BERGER其他文献

BONNIE BERGER的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('BONNIE BERGER', 18)}}的其他基金

Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
  • 批准号:
    10207091
  • 财政年份:
    2021
  • 资助金额:
    $ 35.99万
  • 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
  • 批准号:
    10670057
  • 财政年份:
    2021
  • 资助金额:
    $ 35.99万
  • 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
  • 批准号:
    10004966
  • 财政年份:
    2020
  • 资助金额:
    $ 35.99万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10266081
  • 财政年份:
    2020
  • 资助金额:
    $ 35.99万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10459604
  • 财政年份:
    2020
  • 资助金额:
    $ 35.99万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10662349
  • 财政年份:
    2020
  • 资助金额:
    $ 35.99万
  • 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
  • 批准号:
    10212991
  • 财政年份:
    2020
  • 资助金额:
    $ 35.99万
  • 项目类别:
Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
  • 批准号:
    9546755
  • 财政年份:
    2013
  • 资助金额:
    $ 35.99万
  • 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
  • 批准号:
    8849927
  • 财政年份:
    2013
  • 资助金额:
    $ 35.99万
  • 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
  • 批准号:
    8599836
  • 财政年份:
    2013
  • 资助金额:
    $ 35.99万
  • 项目类别:

相似海外基金

REU Site: Algorithm Design --- Theory and Engineering
REU网站:算法设计---理论与工程
  • 批准号:
    2349179
  • 财政年份:
    2024
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Standard Grant
REU Site: Quantum Machine Learning Algorithm Design and Implementation
REU 站点:量子机器学习算法设计与实现
  • 批准号:
    2349567
  • 财政年份:
    2024
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Standard Grant
Product structures theorems and unified methods of algorithm design for geometrically constructed graphs
几何构造图的乘积结构定理和算法设计统一方法
  • 批准号:
    23K10982
  • 财政年份:
    2023
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Algorithm Design in Strategic and Uncertain Environments
战略和不确定环境中的算法设计
  • 批准号:
    RGPIN-2016-03885
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Discovery Grants Program - Individual
Human-Centered Algorithm Design for High Stakes Decision-Making in Public Services
以人为本的公共服务高风险决策算法设计
  • 批准号:
    DGECR-2022-00401
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Discovery Launch Supplement
Human-Centered Algorithm Design for High Stakes Decision-Making in Public Services
以人为本的公共服务高风险决策算法设计
  • 批准号:
    RGPIN-2022-04570
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Discovery Grants Program - Individual
Algorithm Design
算法设计
  • 批准号:
    CRC-2015-00122
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Canada Research Chairs
Control Theory and Algorithm Design for Nonlinear Systems Based on Finite Dimensionality of Holonomic Functions
基于完整函数有限维的非线性系统控制理论与算法设计
  • 批准号:
    22K17855
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Scalable Algorithm Design for Unbiased Estimation via Couplings of Markov Chain Monte Carlo Methods
通过马尔可夫链蒙特卡罗方法耦合进行无偏估计的可扩展算法设计
  • 批准号:
    2210849
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Continuing Grant
Modern mathematical models of big data-driven problems in biological sequence analysis with applications to efficient algorithm design
生物序列分析中大数据驱动问题的现代数学模型及其在高效算法设计中的应用
  • 批准号:
    569312-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 35.99万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了