Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
基本信息
- 批准号:10670057
- 负责人:
- 金额:$ 35.89万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-06-01 至 2026-05-31
- 项目状态:未结题
- 来源:
- 关键词:Active LearningAlgorithm DesignAlgorithmsBig DataBiologicalBiologyBiotechnologyCellsCellular biologyClustered Regularly Interspaced Short Palindromic RepeatsCommunitiesComplexComputational algorithmComputer softwareDataData SetDevelopmentDimensionsDiseaseExperimental DesignsFeedbackGenesGeneticGenomicsHeterogeneityHigh-Throughput Nucleotide SequencingIndividualMachine LearningMeasuresMemoryModelingMutationOutcomePathologicPerformancePhenotypePropertyProteomicsSecureSoftware ToolsSpatial DesignState InterestsSystems BiologyTimeTissuesTranslatingUncertaintyUntranslated RNAValidationWorkcell growthcomputing resourcesdesignexperimental studygenomic datahigh dimensionalityimprovedinsightinterestmachine learning frameworkmultimodalitymultiple omicsnovelprecision medicinesmall moleculestructural biologytranscriptomics
项目摘要
Project Summary
With the rise of high-throughput sequencing and multiplexed biotechnologies enabling single-cell multi-omics
and massively parallel CRISPR experiments, the biomedical community is generating a monumental amount of
data. These data promise to reveal new biology and drive personal and precision medicine. However, the sheer
volume of genomic data is overwhelming current computational resources, requiring prohibitively high compute
time, memory usage, and storage. My lab has been at the forefront of solving big data challenges in genomics,
designing novel algorithms that enable efficient and secure analyses that were previously computationally
infeasible, and that reveal novel structural, cellular, and systems biology. Drawing upon our expertise in
developing scalable and insightful algorithms for analyzing genomic, transcriptomic, and proteomic data, we aim
to tackle two key data-driven challenges facing the biological community: 1) efficient, accurate, and robust
characterization of tissues at the single-cell level, and 2) translating high-throughput datasets into biological
discoveries via machine learning-based prediction. To solve the first challenge, we will leverage our discovery
that seemingly high-dimensional sequencing data often lies on low-dimensional manifolds that capture the
underlying biological state of interest. We will design algorithms that generate these compact, meaningful
manifold representations of single-cell omics datasets. This will enable a number of key applications including
characterizing co-expression and gene-modules that define healthy and pathologic cell states; integrating
multi-modal single-cell omics datasets to more richly characterize cellular diversity; and investigating the
mechanisms underlying transcriptomic diversity across tissues and developmental states. To solve the second
challenge, we will take a two-pronged approach. First, we will design novel machine learning frameworks that
provide a measure of confidence when predicting in unfamiliar biological states, enabling prediction that is robust
to “out-of-distribution” (unobserved) examples. We will then work with our experimental collaborators and CROs
to rapidly perform experimental validation of model-based predictions. Finally, we will return the experimental
results to the model to further improve performance. This will enable an “active learning” feedback loop to
efficiently explore a complex biological space for outcomes of interest. We will use this uncertainty-powered
active learning approach to explore several pressing biological concerns such as the identification of small
molecule compounds with enzymatic or whole-cell growth inhibitory properties, efficient design of spatial-
transcriptomic experiments, computationally guided CRISPR perturbation experiments, and identification of
functional non-coding mutations. This project will result in 1) numerous software tools with wide utility that
efficiently analyze massive biological datasets and guide complex experimentation, and 2) reveal biological
insights, especially into biomolecular interactions and cellular heterogeneity.
项目摘要
随着高通量测序和多重生物技术的兴起,
和大规模并行CRISPR实验,生物医学界正在产生大量的
数据这些数据有望揭示新的生物学,并推动个人和精准医疗。然而,
大量的基因组数据压倒了当前的计算资源,需要极高的计算能力
时间、内存使用和存储。我的实验室一直处于解决基因组学大数据挑战的最前沿,
设计新的算法,使以前计算的有效和安全的分析,
不可行,并揭示了新的结构,细胞和系统生物学。利用我们的专业知识,
开发可扩展和有见地的算法,用于分析基因组,转录组和蛋白质组数据,我们的目标是
解决生物界面临的两个关键的数据驱动的挑战:1)高效,准确和强大
在单细胞水平上表征组织,以及2)将高通量数据集转化为生物学特性。
通过基于机器学习的预测发现。为了解决第一个挑战,我们将利用我们的发现
看似高维的测序数据通常位于低维流形上,
潜在的生物学状态我们将设计算法来生成这些紧凑的,有意义的,
单细胞组学数据集的多种表示。这将使许多关键应用程序,包括
表征定义健康和病理细胞状态的共表达和基因模块;
多模态单细胞组学数据集,以更丰富地表征细胞多样性;并研究
在组织和发育状态中转录组多样性的潜在机制。为了解决第二个问题
面对挑战,我们将双管齐下。首先,我们将设计新颖的机器学习框架,
在不熟悉的生物状态下进行预测时,提供一种置信度,从而实现稳健的预测
到“未观察到的”(out-of-distribution)例子。然后,我们将与我们的实验合作者和CRO合作
快速执行基于模型的预测的实验验证。最后,我们将把实验性的
模型,以进一步提高性能。这将实现“主动学习”反馈循环,
有效地探索复杂的生物空间,以获得感兴趣的结果。我们将使用这种不确定性驱动的
积极的学习方法,探索几个紧迫的生物问题,如识别小
具有酶或全细胞生长抑制特性的分子化合物,空间-
转录组学实验,计算引导的CRISPR扰动实验,以及
功能性非编码突变。这个项目将导致1)众多的软件工具,具有广泛的实用性,
有效地分析大量生物数据集并指导复杂的实验,以及2)揭示生物
洞察力,特别是对生物分子相互作用和细胞异质性。
项目成果
期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Equivariant Scalar Fields for Molecular Docking with Fast Fourier Transforms
- DOI:10.48550/arxiv.2312.04323
- 发表时间:2023-12
- 期刊:
- 影响因子:0
- 作者:Bowen Jing;T. Jaakkola;Bonnie Berger
- 通讯作者:Bowen Jing;T. Jaakkola;Bonnie Berger
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
BONNIE BERGER其他文献
BONNIE BERGER的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('BONNIE BERGER', 18)}}的其他基金
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10401890 - 财政年份:2021
- 资助金额:
$ 35.89万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10207091 - 财政年份:2021
- 资助金额:
$ 35.89万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10004966 - 财政年份:2020
- 资助金额:
$ 35.89万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10212991 - 财政年份:2020
- 资助金额:
$ 35.89万 - 项目类别:
Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
- 批准号:
9546755 - 财政年份:2013
- 资助金额:
$ 35.89万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8849927 - 财政年份:2013
- 资助金额:
$ 35.89万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8599836 - 财政年份:2013
- 资助金额:
$ 35.89万 - 项目类别:
相似海外基金
REU Site: Algorithm Design --- Theory and Engineering
REU网站:算法设计---理论与工程
- 批准号:
2349179 - 财政年份:2024
- 资助金额:
$ 35.89万 - 项目类别:
Standard Grant
REU Site: Quantum Machine Learning Algorithm Design and Implementation
REU 站点:量子机器学习算法设计与实现
- 批准号:
2349567 - 财政年份:2024
- 资助金额:
$ 35.89万 - 项目类别:
Standard Grant
Product structures theorems and unified methods of algorithm design for geometrically constructed graphs
几何构造图的乘积结构定理和算法设计统一方法
- 批准号:
23K10982 - 财政年份:2023
- 资助金额:
$ 35.89万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Algorithm Design in Strategic and Uncertain Environments
战略和不确定环境中的算法设计
- 批准号:
RGPIN-2016-03885 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Discovery Grants Program - Individual
Human-Centered Algorithm Design for High Stakes Decision-Making in Public Services
以人为本的公共服务高风险决策算法设计
- 批准号:
DGECR-2022-00401 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Discovery Launch Supplement
Human-Centered Algorithm Design for High Stakes Decision-Making in Public Services
以人为本的公共服务高风险决策算法设计
- 批准号:
RGPIN-2022-04570 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Discovery Grants Program - Individual
Control Theory and Algorithm Design for Nonlinear Systems Based on Finite Dimensionality of Holonomic Functions
基于完整函数有限维的非线性系统控制理论与算法设计
- 批准号:
22K17855 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Scalable Algorithm Design for Unbiased Estimation via Couplings of Markov Chain Monte Carlo Methods
通过马尔可夫链蒙特卡罗方法耦合进行无偏估计的可扩展算法设计
- 批准号:
2210849 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Continuing Grant
Spectral Techniques in Algorithm Design and Analysis
算法设计和分析中的谱技术
- 批准号:
RGPIN-2020-04385 - 财政年份:2022
- 资助金额:
$ 35.89万 - 项目类别:
Discovery Grants Program - Individual