Deep learning for population genetics

群体遗传学的深度学习

基本信息

  • 批准号:
    9976348
  • 负责人:
  • 金额:
    $ 52.92万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-04-21 至 2024-02-28
  • 项目状态:
    已结题

项目摘要

Project Summary The revolution in genome sequencing technologies over the past 15 years has created an explosion of population genomic data but has left in its wake a gap in our ability to make sense of data at this scale. In particular, whereas population genetics as a field has been traditionally data-limited, the massive volume of current sequencing means that previously unanswerable questions may now be within reach. To capitalize on this flood of information we need new methods and modes of analysis. In the past 5 years the world of machine learning has been revolutionized by the rise of deep neural networks. These so-called deep learning methods offer incredible flexibility as well as astounding improvements in performance for a wide array of machine learning tasks, including computer vision, speech recognition, and natural language processing. This proposal aims to harness the great potential of deep learning for population genetic inference. In recent years our group has made great strides in using supervised machine learning for population genomic analysis (reviewed in Schrider and Kern 2018). However, this work has focused primarily on using more traditional machine learning methods such as random forests. As we argue in this proposal, DNA sequence data are particularly well suited for modern deep learning techniques, and we demonstrate that the application of these methods can rapidly lead to state-of-the-art performance in very difficult population genetic tasks such as estimating rates of recombination. The power of these methods for handling genetic data stems in part from their ability to automatically learn to extract as much useful information as possible from an alignment of DNA sequences in order to solve the task at hand, rather than relying on one or more predefined summary statistics which are generally problem-specific and may omit information present in the raw data. In this proposal we lay out a systematic approach for both empowering the field with these tools and understanding their shortcomings. In particular, we propose to design deep neural networks for solving population genetic problems, and incorporate successful networks into user-friendly software tools that will be shared with the community. We will also investigate a variety of methods for estimating the uncertainty of predictions produced by deep learning methods; this area is understudied in machine learning but of great importance to biological researchers who require an accurate measure of the degree of uncertainty surrounding an estimate. Finally, we will explore the impact of training data misspecification—wherein the data used to train a machine learning method differ systematically from the data to which it will be applied in practice. We will devise techniques to mitigate the impact of such misspecification in order to ensure that our tools will be robust to the complications inherent in analyzing real genomic data sets. Together, these advances have the potential to transform the methodological landscape of population genetic inference.
项目摘要 过去15年来,基因组测序技术的革命创造了一个爆炸式的增长, 人口基因组数据,但在我们理解这种规模的数据的能力方面留下了差距。在 特别是,而人口遗传学作为一个领域一直是传统的数据有限,大量的 目前的排序意味着,以前无法回答的问题现在可能触手可及。为了利用 面对信息的洪流,我们需要新的分析方法和模式。 在过去的5年里,机器学习的世界已经被深度神经网络的兴起所彻底改变。 网络.这些所谓的深度学习方法提供了令人难以置信的灵活性, 提高各种机器学习任务的性能,包括计算机视觉、语音 识别和自然语言处理。该提案旨在利用深海的巨大潜力, 学习群体遗传推断。 近年来,我们的团队在使用监督机器学习进行人口统计方面取得了很大进展。 基因组分析(在Schrider和克恩2018中综述)。然而,这项工作主要集中在使用 更传统的机器学习方法,例如随机森林。正如我们在这份提案中所说,DNA 序列数据特别适合现代深度学习技术,我们证明了 这些方法的应用可以迅速导致在非常困难的群体遗传学中的最先进的性能。 任务,如估计重组率。这些处理基因数据的方法的力量源于 部分原因是它们能够自动学习从 为了解决手头的任务,DNA序列的比对,而不是依赖于一个或多个预定义的 汇总统计数据,通常是特定问题的,可能会忽略原始数据中存在的信息。 在这份提案中,我们提出了一种系统的方法,既可以用这些工具来增强该领域的能力, 了解他们的缺点。特别是,我们建议设计深度神经网络来解决 人口遗传问题,并将成功的网络纳入用户友好的软件工具, 与社区分享。我们还将研究各种方法来估计的不确定性 由深度学习方法产生的预测;这一领域在机器学习中研究不足,但 对于需要精确测量不确定性程度的生物研究人员来说, 围绕一个估计。最后,我们将探讨训练数据错误指定的影响-其中数据 用于训练机器学习方法的数据系统地不同于它将被应用于 实践我们将设计技术来减轻这种错误说明的影响,以确保我们的 工具对于分析真实的基因组数据集所固有的复杂性将是鲁棒的。所有这些 这些进展有可能改变群体遗传推断的方法论格局。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

ANDREW D KERN其他文献

ANDREW D KERN的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('ANDREW D KERN', 18)}}的其他基金

Computational Population Genetics
计算群体遗传学
  • 批准号:
    10552275
  • 财政年份:
    2023
  • 资助金额:
    $ 52.92万
  • 项目类别:
Deep learning for population genetics
群体遗传学的深度学习
  • 批准号:
    10349557
  • 财政年份:
    2020
  • 资助金额:
    $ 52.92万
  • 项目类别:
Deep learning for population genetics
群体遗传学的深度学习
  • 批准号:
    10574510
  • 财政年份:
    2020
  • 资助金额:
    $ 52.92万
  • 项目类别:
Population genomics of adaptation
适应的群体基因组学
  • 批准号:
    9383198
  • 财政年份:
    2017
  • 资助金额:
    $ 52.92万
  • 项目类别:
POPULATION GENOMICS OF ADAPTATION
适应的群体基因组学
  • 批准号:
    9753261
  • 财政年份:
    2017
  • 资助金额:
    $ 52.92万
  • 项目类别:
Human Population Genomics
人类基因组学
  • 批准号:
    7053104
  • 财政年份:
    2005
  • 资助金额:
    $ 52.92万
  • 项目类别:
Human Population Genomics
人类基因组学
  • 批准号:
    7283831
  • 财政年份:
    2005
  • 资助金额:
    $ 52.92万
  • 项目类别:
Human Population Genomics
人类基因组学
  • 批准号:
    7146707
  • 财政年份:
    2005
  • 资助金额:
    $ 52.92万
  • 项目类别:

相似国自然基金

层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
  • 批准号:
    2021JJ40433
  • 批准年份:
    2021
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
  • 批准号:
    32001603
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
AREA国际经济模型的移植.改进和应用
  • 批准号:
    18870435
  • 批准年份:
    1988
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

Tribal Intertidal Digital Ecological Surveys Project: Using Large-Area Imaging to Assess Intertidal Biological Response to Changing Oceanographic Conditions in Partnership with Indigenous Nations
部落潮间带数字生态调查项目:与土著民族合作,利用大面积成像评估潮间带生物对不断变化的海洋条件的反应
  • 批准号:
    532685-2019
  • 财政年份:
    2022
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Tribal Intertidal Digital Ecological Surveys Project: Using Large-Area Imaging to Assess Intertidal Biological Response to Changing Oceanographic Conditions in Partnership with Indigenous Nations
部落潮间带数字生态调查项目:与土著民族合作,利用大面积成像评估潮间带生物对不断变化的海洋条件的反应
  • 批准号:
    532685-2019
  • 财政年份:
    2020
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
biological interactions among forest-dwelling fungus gnats and their natural enemies in shiitake mashroom production area
香菇产区森林真菌蚊与其天敌之间的生物相互作用
  • 批准号:
    19K06152
  • 财政年份:
    2019
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Tribal Intertidal Digital Ecological Surveys Project: Using Large-Area Imaging to Assess Intertidal Biological Response to Changing Oceanographic Conditions in Partnership with Indigenous Nations
部落潮间带数字生态调查项目:与土著民族合作,利用大面积成像评估潮间带生物对不断变化的海洋条件的反应
  • 批准号:
    532685-2019
  • 财政年份:
    2019
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
To what extent does governance play a role in how effectively a marine protected area in the Irish Sea reaches its biological and socioeconomic goals?
治理在多大程度上对爱尔兰海海洋保护区如何有效实现其生物和社会经济目标发挥作用?
  • 批准号:
    2287487
  • 财政年份:
    2019
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Studentship
War and Biological Ageing in Vietnam: A Planning Grant to Foster Collaboration on a Novel Area of Global Research in Health and Ageing
越南的战争与生物衰老:一项规划拨款,以促进全球健康与老龄化研究新领域的合作
  • 批准号:
    404425
  • 财政年份:
    2019
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Miscellaneous Programs
Impact assessment of Noctiluca scintillans red tide on nutrient dynamics, biological processes in lower trophic levels and material cycle in the neritic area of Sagami Bay
夜光藻赤潮对相模湾浅海区营养动态、低营养层生物过程和物质循环的影响评估
  • 批准号:
    18K05794
  • 财政年份:
    2018
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Large-area graphene based chemical and biological sensors
基于大面积石墨烯的化学和生物传感器
  • 批准号:
    355863-2011
  • 财政年份:
    2015
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Discovery Grants Program - Individual
Large-area graphene based chemical and biological sensors
基于大面积石墨烯的化学和生物传感器
  • 批准号:
    355863-2011
  • 财政年份:
    2014
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Discovery Grants Program - Individual
Theoretical simulation and experimental study on biological weathering mechanism of the rock around coastal area in Yaeyama Islands
八重山群岛沿岸岩石生物风化机制的理论模拟与实验研究
  • 批准号:
    26790079
  • 财政年份:
    2014
  • 资助金额:
    $ 52.92万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了