Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
基本信息
- 批准号:10207091
- 负责人:
- 金额:$ 47.83万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-06-01 至 2026-05-31
- 项目状态:未结题
- 来源:
- 关键词:Active LearningAlgorithm DesignAlgorithmic AnalysisAlgorithmic SoftwareAlgorithmsBig DataBiologicalBiologyBiotechnologyCellsCellular biologyClustered Regularly Interspaced Short Palindromic RepeatsCommunitiesComplexComputational algorithmDataData SetDevelopmentDimensionsDiseaseExperimental DesignsFeedbackGenesGeneticGenomicsHeterogeneityHigh-Throughput Nucleotide SequencingIndividualMachine LearningMeasuresMemoryModelingMutationOutcomePathologicPerformancePhenotypePropertyProteomicsSecureSoftware ToolsSpatial DesignState InterestsSystems BiologyTimeTissuesTranslatingUncertaintyUntranslated RNAValidationWorkbasecell growthcomputing resourcesdesignexperimental studygenomic datahigh dimensionalityimprovedinsightinterestmultimodalitymultiple omicsnovelprecision medicinesmall moleculestructural biologytranscriptomics
项目摘要
Project Summary
With the rise of high-throughput sequencing and multiplexed biotechnologies enabling single-cell multi-omics
and massively parallel CRISPR experiments, the biomedical community is generating a monumental amount of
data. These data promise to reveal new biology and drive personal and precision medicine. However, the sheer
volume of genomic data is overwhelming current computational resources, requiring prohibitively high compute
time, memory usage, and storage. My lab has been at the forefront of solving big data challenges in genomics,
designing novel algorithms that enable efficient and secure analyses that were previously computationally
infeasible, and that reveal novel structural, cellular, and systems biology. Drawing upon our expertise in
developing scalable and insightful algorithms for analyzing genomic, transcriptomic, and proteomic data, we aim
to tackle two key data-driven challenges facing the biological community: 1) efficient, accurate, and robust
characterization of tissues at the single-cell level, and 2) translating high-throughput datasets into biological
discoveries via machine learning-based prediction. To solve the first challenge, we will leverage our discovery
that seemingly high-dimensional sequencing data often lies on low-dimensional manifolds that capture the
underlying biological state of interest. We will design algorithms that generate these compact, meaningful
manifold representations of single-cell omics datasets. This will enable a number of key applications including
characterizing co-expression and gene-modules that define healthy and pathologic cell states; integrating
multi-modal single-cell omics datasets to more richly characterize cellular diversity; and investigating the
mechanisms underlying transcriptomic diversity across tissues and developmental states. To solve the second
challenge, we will take a two-pronged approach. First, we will design novel machine learning frameworks that
provide a measure of confidence when predicting in unfamiliar biological states, enabling prediction that is robust
to “out-of-distribution” (unobserved) examples. We will then work with our experimental collaborators and CROs
to rapidly perform experimental validation of model-based predictions. Finally, we will return the experimental
results to the model to further improve performance. This will enable an “active learning” feedback loop to
efficiently explore a complex biological space for outcomes of interest. We will use this uncertainty-powered
active learning approach to explore several pressing biological concerns such as the identification of small
molecule compounds with enzymatic or whole-cell growth inhibitory properties, efficient design of spatial-
transcriptomic experiments, computationally guided CRISPR perturbation experiments, and identification of
functional non-coding mutations. This project will result in 1) numerous software tools with wide utility that
efficiently analyze massive biological datasets and guide complex experimentation, and 2) reveal biological
insights, especially into biomolecular interactions and cellular heterogeneity.
项目摘要
随着高通量测序和多路复用生物技术的兴起
以及大规模平行的CRISPR实验,生物医学界正在每月产生一定数量的
数据。这些数据有望揭示新的生物学并推动个人和精确医学。但是,纯粹
基因组数据的数量是当前的大量计算资源,需要高度计算
时间,内存使用和存储。我的实验室一直处于解决基因组学上的大数据挑战的最前沿,
设计新型算法,以实现先前计算的有效且安全的分析
不可行,并且揭示了新颖的结构,细胞和系统生物学。利用我们的专业知识
开发可扩展和有见地的算法,用于分析的基因组,转录组和蛋白质组学数据,我们的目标
解决生物学界面临的两个关键数据驱动挑战:1)有效,准确且稳健
在单细胞级别的定时表征,以及2)将高通量数据集转化为生物学
通过基于机器学习的预测发现。为了解决第一个挑战,我们将利用我们的发现
这似乎高维测序数据通常位于捕获该捕获的低维流形上
潜在的生物学状态。我们将设计产生这些紧凑,有意义的算法
单细胞OMICS数据集的歧管表示。这将启用许多关键应用程序
表征定义健康和病理细胞态的共表达和基因模块;整合
多模式的单细胞OMICS数据集以更丰富的细胞多样性来表征;并调查
组织和发育状态之间转录组多样性的基础机制。解决第二个
挑战,我们将采取两管齐下的方法。首先,我们将设计新颖的机器学习框架
预测在陌生的生物状态时提供信心的衡量,实现了强大的预测
为“分发”(未观察到)示例。然后,我们将与我们的实验合作者和CROS合作
快速对基于模型的预测进行实验验证。最后,我们将返回实验
结果为模型进一步提高性能。这将使“积极学习”反馈循环到达
有效地探索一个复杂的生物学空间,以获得感兴趣的结果。我们将使用这种不确定性驱动
积极学习方法来探索几种紧迫的生物学问题,例如识别小的生物学问题
具有酶促或全细胞生长抑制特性的分子化合物,空间的有效设计
转录组实验,计算引导的CRISPR扰动实验和鉴定
功能性非编码突变。该项目将导致1)许多具有广泛实用程序的软件工具
有效分析大量生物学数据集并指导复杂的实验,2)揭示生物学
洞察力,尤其是对生物分子相互作用和细胞异质性的见解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
BONNIE BERGER其他文献
BONNIE BERGER的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('BONNIE BERGER', 18)}}的其他基金
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10401890 - 财政年份:2021
- 资助金额:
$ 47.83万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10670057 - 财政年份:2021
- 资助金额:
$ 47.83万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10004966 - 财政年份:2020
- 资助金额:
$ 47.83万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10212991 - 财政年份:2020
- 资助金额:
$ 47.83万 - 项目类别:
Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
- 批准号:
9546755 - 财政年份:2013
- 资助金额:
$ 47.83万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8849927 - 财政年份:2013
- 资助金额:
$ 47.83万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8599836 - 财政年份:2013
- 资助金额:
$ 47.83万 - 项目类别:
相似国自然基金
概率约束条件下非线性系统混合最优控制的数值算法设计、分析与应用
- 批准号:62363005
- 批准年份:2023
- 资助金额:32.00 万元
- 项目类别:地区科学基金项目
随机密度泛函理论的算法设计和分析
- 批准号:12371431
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
高效格加密算法的设计与分析
- 批准号:62372445
- 批准年份:2023
- 资助金额:50.00 万元
- 项目类别:面上项目
松弛团提取问题的算法设计、分析与实验
- 批准号:62372093
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
分布式机器学习算法设计与理论分析
- 批准号:62376008
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 47.83万 - 项目类别:
An acquisition and analysis pipeline for integrating MRI and neuropathology in TBI-related dementia and VCID
用于将 MRI 和神经病理学整合到 TBI 相关痴呆和 VCID 中的采集和分析流程
- 批准号:
10810913 - 财政年份:2023
- 资助金额:
$ 47.83万 - 项目类别:
A novel, one stop, affordable, point of care and artificial intelligence supported system of screening, triage and treatment selection for cervical cancer and precancer in the LMICs
一种新型、一站式、经济实惠的护理点和人工智能支持系统,用于中低收入国家宫颈癌和癌前病变的筛查、分诊和治疗选择
- 批准号:
10560812 - 财政年份:2023
- 资助金额:
$ 47.83万 - 项目类别:
Remote Kinesiology for Improving Human Health with Auto-locating Compliant Motion Tracking Stickers and Artificial Intelligence
通过自动定位兼容运动跟踪贴纸和人工智能来改善人类健康的远程运动机能学
- 批准号:
10751952 - 财政年份:2023
- 资助金额:
$ 47.83万 - 项目类别:
Sugar Probed SRS Volumetric imaging of Metabolic Activities
代谢活动的糖探针 SRS 体积成像
- 批准号:
10639208 - 财政年份:2023
- 资助金额:
$ 47.83万 - 项目类别: