CAREER: Modeling Genetic Variation Using Deep Generative Neural Networks

职业:使用深度生成神经网络对遗传变异进行建模

基本信息

  • 批准号:
    2145577
  • 负责人:
  • 金额:
    $ 49.89万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-07-01 至 2027-06-30
  • 项目状态:
    未结题

项目摘要

The cost of genome sequencing has decreased by orders of magnitude over the past two decades, enabling the creation of datasets comprised of up to millions of genomes of plants, animals, and humans. The long-term vision behind this research is to develop new methods for analyzing such datasets and to understand how genetic and environmental factors determine complex traits relevant to medicine and agriculture. Existing methods for analyzing genetic data often struggle with the size and the complexity of today's massive datasets. This research seeks to improve existing approaches via novel techniques in artificial intelligence and machine learning. Specifically, this project will develop new mathematical models of genomic sequences that will serve as the basis for algorithms for genetic data analysis, including for tasks such as analyzing human ancestry, understanding the effect of genetics on disease, and more. The second part the project will explore a specific application of the new models--assaying genomic sequences with high accuracy and low cost and will develop open-source software for this task. This software will contribute to supporting the cost of acquiring massive genetic datasets, and will facilitate large-scale genetic studies. These efforts will positively impact downstream applications that rely on accurate genomes--medical genetics, animal breeding, and others--and will contribute to enabling cheaper and more accurate medical diagnosis, explaining the role of genetics in human disease, and helping breed more nourishing crops, ultimately improving human and environmental health. The long-term research vision behind this project is to create next-generation algorithms for statistical genetics based on novel methods in machine learning and deep learning. This proposal begins a first step in this direction by creating novel methods for modeling genetic variation and apply them to two important problems in statistical genetics; that is, genotype imputation and low-pass genome sequencing. Both problems involve determining the complete sequence of a genome from a small number of measurements obtained using an inexpensive assay. Specifically, this research has two primary aims: (1) to develop a novel deep generative model of genetic sequences that replaces classical approaches based on hidden Markov models and that can serve as the foundation for algorithms throughout statistical genetics; (2) to significantly reduce the cost of genomic assays via novel algorithms for imputation and low-pass sequencing based on the new model. Central to this effort is the development of new techniques, approaches, and frameworks in deep generative modeling that address challenges posed by genetic data--including high dimensionality and long range sequence dependencies--and that are useful beyond genomics. Ultimately, we envision this work laying the foundation for a new field of deep statistical genetics and inspire new algorithms for problems throughout the field, including haplotyping, ancestry inference, genome-wide association study analysis, polygenic risk scoring, and beyond.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在过去的二十年里,基因组测序的成本已经下降了几个数量级,这使得创建由多达数百万个植物、动物和人类基因组组成的数据集成为可能。这项研究背后的长期愿景是开发新的方法来分析这些数据集,并了解遗传和环境因素如何决定与医学和农业相关的复杂性状。现有的分析基因数据的方法常常与当今海量数据集的大小和复杂性作斗争。本研究旨在通过人工智能和机器学习的新技术改进现有方法。具体而言,该项目将开发新的基因组序列数学模型,作为遗传数据分析算法的基础,包括分析人类祖先、了解遗传对疾病的影响等任务。项目的第二部分将探索新模型的具体应用——以高精度和低成本分析基因组序列,并将为此开发开源软件。该软件将有助于支持获取大量遗传数据集的成本,并将促进大规模的遗传研究。这些努力将对依赖精确基因组的下游应用——医学遗传学、动物育种等——产生积极影响,并将有助于实现更便宜、更准确的医学诊断,解释遗传学在人类疾病中的作用,帮助培育更有营养的作物,最终改善人类和环境健康。该项目背后的长期研究愿景是基于机器学习和深度学习的新方法,为统计遗传学创建下一代算法。本提案通过创建新的遗传变异建模方法并将其应用于统计遗传学中的两个重要问题,开始了这个方向的第一步;即基因型插入和低通基因组测序。这两个问题都涉及通过使用廉价的分析方法获得的少量测量结果来确定基因组的完整序列。具体而言,本研究有两个主要目标:(1)开发一种新的基因序列深度生成模型,取代基于隐马尔可夫模型的经典方法,并可作为整个统计遗传学算法的基础;(2)通过基于新模型的插补和低通测序新算法显著降低基因组分析成本。这项工作的核心是在深度生成建模中开发新技术、新方法和新框架,以解决遗传数据带来的挑战——包括高维和长距离序列依赖——并且这些挑战在基因组学之外也很有用。最终,我们设想这项工作为深度统计遗传学的新领域奠定基础,并激发整个领域问题的新算法,包括单倍型,祖先推断,全基因组关联研究分析,多基因风险评分等。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies
  • DOI:
  • 发表时间:
    2022-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shachi Deshpande;Kaiwen Wang;Dhruv Sreenivas;Zheng Li;Volodymyr Kuleshov
  • 通讯作者:
    Shachi Deshpande;Kaiwen Wang;Dhruv Sreenivas;Zheng Li;Volodymyr Kuleshov
Semi-Parametric Inducing Point Networks and Neural Processes
  • DOI:
  • 发表时间:
    2022-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    R. Rastogi;Yair Schiff;Alon Hacohen;Zhaozhi Li;I-Hsiang Lee;Yuntian Deng;M. Sabuncu;Volodymyr Kuleshov-V
  • 通讯作者:
    R. Rastogi;Yair Schiff;Alon Hacohen;Zhaozhi Li;I-Hsiang Lee;Yuntian Deng;M. Sabuncu;Volodymyr Kuleshov-V
Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation
  • DOI:
  • 发表时间:
    2021-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Volodymyr Kuleshov;Shachi Deshpande
  • 通讯作者:
    Volodymyr Kuleshov;Shachi Deshpande
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Volodymyr Kuleshov其他文献

Inverse Game Theory: Learning Utilities in Succinct Games
逆博弈论:简洁博弈中的学习实用工具
Synthetic long read sequencing reveals the composition and intraspecies diversity of the human microbiome
合成长读测序揭示了人类微生物组的组成和种内多样性
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    46.9
  • 作者:
    Volodymyr Kuleshov;Chao Jiang;Wenyu Zhou;F. Jahanbani;S. Batzoglou;Michael P. Snyder
  • 通讯作者:
    Michael P. Snyder
Time Series Super Resolution withTemporal Adaptive Batch Normalization
具有时间自适应批量归一化的时间序列超分辨率
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Volodymyr Kuleshov;Sawyer Birnbaum;Zayd Enam;Pang Wei Koh;Stefano Ermon
  • 通讯作者:
    Stefano Ermon
Simple and Effective Masked Diffusion Language Models
简单有效的掩蔽扩散语言模型
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    S. Sahoo;Marianne Arriola;Yair Schiff;Aaron Gokaslan;Edgar Marroquin;Justin T Chiu;Alexander Rush;Volodymyr Kuleshov
  • 通讯作者:
    Volodymyr Kuleshov
Harnessing Biomedical Literature to Calibrate Clinicians’ Trust in AI Decision Support Systems
利用生物医学文献来校准临床医生对人工智能决策支持系统的信任

Volodymyr Kuleshov的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Galaxy Analytical Modeling Evolution (GAME) and cosmological hydrodynamic simulations.
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目

相似海外基金

Deconvoluting the Ewing sarcoma genetic program using ancestry-informed human iPSC modeling
使用基于血统的人类 iPSC 模型对尤文肉瘤遗传程序进行解卷积
  • 批准号:
    10562800
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
Modeling genetic contributions to biliary atresia
模拟遗传对胆道闭锁的影响
  • 批准号:
    10639240
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
Modeling transcriptional and post-transcriptional systems for regulating non-genetic heterogeneity in mammalian cells
模拟转录和转录后系统以调节哺乳动物细胞中的非遗传​​异质性
  • 批准号:
    10623648
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
Diversity in a Dish: Pluripotent Stem Cells in Genetic Analysis and Disease Modeling
培养皿中的多样性:遗传分析和疾病建模中的多能干细胞
  • 批准号:
    10608751
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
A community resource for germline and somatic genetic disease modeling in zebrafish
斑马鱼种系和体细胞遗传疾病模型的社区资源
  • 批准号:
    10723158
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
Population genetic modeling of genetic variation for complex traits and diseases
复杂性状和疾病遗传变异的群体遗传模型
  • 批准号:
    10714605
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
21-BBSRC/NSF-BIO: Modeling of protein interactions to predict phenotypic effects of genetic mutations
21-BBSRC/NSF-BIO:蛋白质相互作用建模以预测基因突变的表型效应
  • 批准号:
    BB/X01830X/1
  • 财政年份:
    2023
  • 资助金额:
    $ 49.89万
  • 项目类别:
    Research Grant
Genetic Modeling of Diet, NFkB, and Metabolic Interactions
饮食、NFkB 和代谢相互作用的遗传建模
  • 批准号:
    10501274
  • 财政年份:
    2022
  • 资助金额:
    $ 49.89万
  • 项目类别:
Modeling and Therapeutic Approaches for Genetic Vasculopathies
遗传性血管病的建模和治疗方法
  • 批准号:
    10706537
  • 财政年份:
    2022
  • 资助金额:
    $ 49.89万
  • 项目类别:
Characterizing Genetic, Neurotransmitter receptors and Macroscopic Brain Interactions via Multi-scale Analytical Modeling
通过多尺度分析模型表征遗传、神经递质受体和宏观大脑相互作用
  • 批准号:
    RGPIN-2021-02670
  • 财政年份:
    2022
  • 资助金额:
    $ 49.89万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了