Implicit generative modeling for computational genomics
计算基因组学的隐式生成模型
基本信息
- 批准号:RGPIN-2020-06770
- 负责人:
- 金额:$ 2.91万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Genomics is the study of DNA sequences in our cells and how they drive function or dysfunction. Genomics data also reveals the flow of information within cells. By comparing genomics data captured under different conditions (mutations or drug treatments) we can observe how they affect cell function. Machine learning can do more than merely observe: it can learn the complex "rules" by which DNA drives function, directly from data. Once learned, these rules can predict which mutations are likely to cause a gene to stop functioning and cause disease. Genomics data can increasingly be generated by automated laboratories, including remote cloud laboratories, which offer better accuracy and consistency than traditional laboratory work. This combination of information-rich data and automated experimentation is finally setting the stage for "self-driven" laboratories. In a self-driven laboratory, experiments are designed by machine learning systems to systematically interrogate an outcome of interest, such as the sequence-function relationships in human cells. Data and models are improved in a 24/7 feedback loop. The proposed research aims to make tractable progress towards this vision. This vision is important because it represents an opportunity for drug discovery. Today, 90% of all drug candidates fail at the clinical stage, after years of development and tens or hundreds of millions of dollars have been spent. Better predictive models of how cells work will help to identify and design compounds for genetic disorders up front, with a higher final success rate. The proposed research focuses on four objectives. The first is to make it easier for machine learning scientists to participate in advancing this vision. Genomics data is notoriously difficult to understand and to work with, so the goal is to build a code library and data repository that enables the machine learning community to contribute to genomics research. The second objective is to improve the accuracy with which genomics data is interpreted, specifically data from RNA sequencing, or RNA-seq. RNA-seq captures a "snapshot" of the RNA molecules in a sample, and is the penultimate step in countless protocols for interrogating cells. However, the snapshot is fragmented. Algorithms are needed to infer (to reconstruct) what RNA molecules were actually present in the sample. The proposed research will investigate a new, more accurate approach to inference based on deep learning. The third objective is to build more accurate models of how RNA is processed in cells by new deep learning techniques. Models of RNA processing are important for predicting the effects of mutations and of potential therapies. (Such models are also based on RNA-seq data, and may benefit from advances in the second objective.) The fourth and final objective is to create algorithms that can automatically design experiments, generating the most valuable data possible from which to build models of RNA biology.
基因组学是研究我们细胞中的DNA序列以及它们如何驱动功能或功能障碍。基因组学数据还揭示了细胞内的信息流。通过比较在不同条件下(突变或药物治疗)捕获的基因组学数据,我们可以观察它们如何影响细胞功能。机器学习可以做的不仅仅是观察:它可以直接从数据中学习DNA驱动功能的复杂“规则”。一旦学会,这些规则可以预测哪些突变可能导致基因停止功能并导致疾病。基因组学数据越来越多地由自动化实验室生成,包括远程云实验室,这些实验室比传统实验室工作提供更好的准确性和一致性。这种信息丰富的数据和自动化实验的结合最终为“自我驱动”实验室奠定了基础。在自我驱动的实验室中,实验由机器学习系统设计,以系统地询问感兴趣的结果,例如人类细胞中的序列-功能关系。数据和模型在24/7反馈循环中得到改进。拟议的研究旨在朝着这一愿景取得易于驾驭的进展。这一愿景很重要,因为它代表了药物发现的机会。今天,经过多年的开发和数千万或数亿美元的花费,90%的候选药物在临床阶段失败。更好地预测细胞如何工作的模型将有助于提前识别和设计遗传疾病的化合物,最终成功率更高。本研究主要围绕四个目标展开。首先是让机器学习科学家更容易参与推进这一愿景。众所周知,基因组学数据很难理解和处理,因此我们的目标是建立一个代码库和数据存储库,使机器学习社区能够为基因组学研究做出贡献。第二个目标是提高基因组学数据解释的准确性,特别是来自RNA测序或RNA-seq的数据。RNA-seq捕获样本中RNA分子的“快照”,是无数询问细胞的方案中的倒数第二步。但是,快照是碎片化的。需要算法来推断(重建)样本中实际存在的RNA分子。拟议的研究将研究一种新的,更准确的基于深度学习的推理方法。第三个目标是通过新的深度学习技术建立更准确的RNA在细胞中如何加工的模型。RNA加工的模型对于预测突变和潜在治疗的影响很重要。(Such模型也基于RNA-seq数据,并可能受益于第二个目标的进展。第四个也是最后一个目标是创建可以自动设计实验的算法,生成最有价值的数据,从而构建RNA生物学模型。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Delong, Andrew其他文献
Interactive Segmentation with Super-Labels.
- DOI:
10.1007/978-3-642-23094-3_11 - 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Delong, Andrew;Gorelick, Lena;Schmidt, Frank R;Veksler, Olga;Boykov, Yuri - 通讯作者:
Boykov, Yuri
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
- DOI:
10.1038/nbt.3300 - 发表时间:
2015-08-01 - 期刊:
- 影响因子:46.9
- 作者:
Alipanahi, Babak;Delong, Andrew;Frey, Brendan J. - 通讯作者:
Frey, Brendan J.
An integral solution to surface evolution PDEs via geo-cuts
- DOI:
10.1007/11744078_32 - 发表时间:
2006-01-01 - 期刊:
- 影响因子:0
- 作者:
Boykov, Yuri;Kolmogorov, Vladimir;Delong, Andrew - 通讯作者:
Delong, Andrew
Fast Approximate Energy Minimization with Label Costs
- DOI:
10.1007/s11263-011-0437-z - 发表时间:
2012-01-01 - 期刊:
- 影响因子:19.5
- 作者:
Delong, Andrew;Osokin, Anton;Boykov, Yuri - 通讯作者:
Boykov, Yuri
Delong, Andrew的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Delong, Andrew', 18)}}的其他基金
Implicit generative modeling for computational genomics
计算基因组学的隐式生成模型
- 批准号:
RGPIN-2020-06770 - 财政年份:2021
- 资助金额:
$ 2.91万 - 项目类别:
Discovery Grants Program - Individual
Implicit generative modeling for computational genomics
计算基因组学的隐式生成模型
- 批准号:
RGPIN-2020-06770 - 财政年份:2020
- 资助金额:
$ 2.91万 - 项目类别:
Discovery Grants Program - Individual
Implicit generative modeling for computational genomics
计算基因组学的隐式生成模型
- 批准号:
DGECR-2020-00323 - 财政年份:2020
- 资助金额:
$ 2.91万 - 项目类别:
Discovery Launch Supplement
Learning and Inference for Hierarchical Models in Computer Vision
计算机视觉中分层模型的学习和推理
- 批准号:
421338-2012 - 财政年份:2013
- 资助金额:
$ 2.91万 - 项目类别:
Postdoctoral Fellowships
Learning and Inference for Hierarchical Models in Computer Vision
计算机视觉中分层模型的学习和推理
- 批准号:
421338-2012 - 财政年份:2012
- 资助金额:
$ 2.91万 - 项目类别:
Postdoctoral Fellowships
Discrete optimisation methods in vision and biomedical imaging
视觉和生物医学成像中的离散优化方法
- 批准号:
348322-2007 - 财政年份:2009
- 资助金额:
$ 2.91万 - 项目类别:
Postgraduate Scholarships - Doctoral
Discrete optimisation methods in vision and biomedical imaging
视觉和生物医学成像中的离散优化方法
- 批准号:
348322-2007 - 财政年份:2008
- 资助金额:
$ 2.91万 - 项目类别:
Postgraduate Scholarships - Doctoral
Discrete optimisation methods in vision and biomedical imaging
视觉和生物医学成像中的离散优化方法
- 批准号:
348322-2007 - 财政年份:2007
- 资助金额:
$ 2.91万 - 项目类别:
Postgraduate Scholarships - Doctoral
相似海外基金
CIF: Small: Towards a Control Framework for Neural Generative Modeling
CIF:小:走向神经生成建模的控制框架
- 批准号:
2348624 - 财政年份:2024
- 资助金额:
$ 2.91万 - 项目类别:
Standard Grant
SG: Species Distribution Modeling on the A.I. frontier: Deep generative models for powerful, general and accessible SDM
SG:人工智能上的物种分布建模
- 批准号:
2329701 - 财政年份:2024
- 资助金额:
$ 2.91万 - 项目类别:
Standard Grant
CAREER: Generative Physical Modeling for Computational Imaging Systems
职业:计算成像系统的生成物理建模
- 批准号:
2239687 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Continuing Grant
Information-Theoretic Surprise-Driven Approach to Enhance Decision Making in Healthcare
信息论惊喜驱动方法增强医疗保健决策
- 批准号:
10575550 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Proteasomal recruiters of PAX3-FOXO1 Designed via Sequence-Based Generative Models
通过基于序列的生成模型设计的 PAX3-FOXO1 蛋白酶体招募剂
- 批准号:
10826068 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Characterizing the generative mechanisms underlying the cortical tracking of natural speech
表征自然语音皮质跟踪背后的生成机制
- 批准号:
10710717 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Geles: A Novel Imaging Informatics System for Generalizable Lesion Identification in Neuroendocrine Tumors
Geles:一种用于神经内分泌肿瘤普遍病变识别的新型影像信息学系统
- 批准号:
10740578 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
AI-Powered Uncovering of Mechanisms in Cancer Through Causal Discovery Analysis and Generative Modeling of Heterogeneous Data
人工智能通过因果发现分析和异构数据生成模型揭示癌症机制
- 批准号:
10581180 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Neural Operator Learning to Predict Aneurysmal Growth and Outcomes
神经算子学习预测动脉瘤的生长和结果
- 批准号:
10636358 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别: