CAREER: Developing New Computational Methods to Address the Missing Data Problem in Population Genomics

职业:开发新的计算方法来解决群体基因组学中的缺失数据问题

基本信息

  • 批准号:
    2147812
  • 负责人:
  • 金额:
    $ 60.84万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-08-15 至 2026-04-30
  • 项目状态:
    未结题

项目摘要

Population genomic data are becoming increasingly affordable and accessible, causing a sudden data explosion in the field of evolutionary biology. With this increased degree of data generation comes another important issue - the missing data problem. This missing data problem could be due to data that is either unobserved (e.g., due to the sampling method), observed incorrectly (e.g., due to errors in method of observation), or can’t be observed (e.g., due to extinction). Missing data are often not accounted for and can cause incorrect conclusions in population genomics research. This project will build bioinformatics software to address these three missing data problems. The statistical framework and methods developed by this project will be utilized extensively by evolutionary biologists in a variety of fields. Additionally, this project will develop accessible software pipelines and curricular material for recruiting and retaining underrepresented groups into computer programming and bioinformatics at a variety of levels (K-12, Undergraduate, Graduate, post-graduate).Population genomic data are either considered to be missing due to (1) sequencing or genotyping errors, (2) systematic bias in the generation of genotyping libraries (e.g. from techniques such as restriction associated DNA sequencing (RADseq), or (3) the absence of genomic data from un-sampled, perhaps extinct “ghost” populations. This project will develop a series of tools to address all three missing data problems by accounting for missing data as an unobserved variable in statistical models for the estimation of population genetic parameters and evolutionary history from genomic data. Specifically, we will (1) build a parallelized statistical framework for estimating population genetic structure from multi-allelic, multi-locus genomic data that incorporates missing data into a maximum likelihood framework, (2) systematically explore RADseq data sets – using extensive simulations and a meta-analysis of published studies to both quantify and account for how missing data due to “lost” polymorphisms at restriction sites biases estimation of evolutionary history, and (3) develop a statistical model to classify genomic loci as those having introgressed from extant or from “ghost” populations based on their coalescent histories under the Isolation with Migration (IM) model. This work will form the basis of a set of robust tools that will be utilized by evolutionary biologists in a variety of fields to systematically both assess and account for the effects of missing data in their population genomic data sets. This CAREER grant will also strengthen University-public partnerships through (1) week-long summer bioinformatics workshops for high-school biology teachers in the Philadelphia and San Diego areas, (2) development of curricular material for The Galaxy Project, the Conservation Genomics Workshop at the University of Montana, and the California State University Program for Education and Research in Biotechnology (CSUPERB), (3) recruitment and retention of underrepresented student scholars into genomics research. All curricular material, software, and pipelines developed will be shared via the PI’s GitHub page: www.github.com/arunsethuraman.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
种群基因组数据变得越来越负担得起和可获得,导致进化生物学领域的数据突然爆炸。随着数据生成程度的增加,另一个重要问题也随之而来--数据丢失问题。这种数据丢失问题可能是由于未观察到的数据(例如,由于采样方法)、不正确观察到的数据(例如,由于观测方法错误)或无法观察到的数据(例如,由于灭绝)造成的。在种群基因组研究中,缺失的数据经常不被考虑,并可能导致错误的结论。该项目将构建生物信息学软件来解决这三个缺失的数据问题。该项目开发的统计框架和方法将被进化生物学家在各个领域广泛使用。此外,该项目将开发可利用的软件管道和课程材料,以招募和留住代表人数不足的群体,使之成为不同级别(K-12、本科生、研究生、研究生)的计算机编程和生物信息学。人口基因组数据被认为是由于(1)测序或基因分型错误,(2)基因分型文库生成中的系统性偏差(例如,从限制相关DNA测序(RADseq)等技术),或(3)从未采样的、可能已灭绝的“幽灵”群体中缺乏基因组数据。该项目将开发一系列工具来解决所有三个缺失数据问题,将缺失数据作为统计模型中的一个不可观测变量来考虑,以便根据基因组数据估计种群遗传参数和进化历史。具体地说,我们将(1)建立一个并行的统计框架,用于从多等位基因、多基因座基因组数据中估计群体遗传结构,该框架将缺失数据纳入最大似然框架,(2)系统地探索RADseq数据集-使用广泛的模拟和对已发表研究的荟萃分析来量化和解释因限制位点的“丢失”多态而导致的缺失数据如何偏向进化史的估计,以及(3)开发统计模型以根据与迁移隔离(IM)模型下的合并历史将基因组座位分类为那些从现有群体或从“幽灵”群体中导入的基因座。这项工作将形成一套强大的工具的基础,进化生物学家将在各种领域利用这些工具来系统地评估和解释其种群基因组数据集中缺失数据的影响。这笔职业补助金还将通过以下方式加强大学与公共部门的伙伴关系:(1)为费城和圣地亚哥地区的高中生物教师举办为期一周的暑期生物信息学研讨会;(2)为银河项目、蒙大拿大学的保护基因组学研讨会和加州州立大学生物技术教育和研究计划(CSUPERB)开发课程材料;(3)招募和保留未被充分代表的学生学者参与基因组学研究。所有开发的课程材料、软件和管道将通过PI的GitHub页面共享:www.githorb.com/arunsethuraman。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A high-quality genome of the convergent lady beetle, Hippodamia convergens
  • DOI:
    10.1093/g3journal/jkae083
  • 发表时间:
    2024-05-08
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    Ang,Gavrila;Zhang,Andrew;Sethuraman,Arun
  • 通讯作者:
    Sethuraman,Arun
Phenotypic differentiation despite gene flow: Beak morphology, bite performance, and population genetics of Loggerhead Shrikes ( Lanius ludovicianus )
尽管基因流仍存在表型分化:红头伯劳 (Lanius ludovicianus) 的喙形态、咬合性能和群体遗传学
  • DOI:
    10.1002/ece3.11079
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    Sustaita, Diego;Wulf, Gwendalyn K.;Sethuraman, Arun
  • 通讯作者:
    Sethuraman, Arun
Lack of phenotypic variation despite population structure in larval utilization of pea aphids by populations of the lady beetle Hippodamia convergens
  • DOI:
    10.1016/j.biocontrol.2020.104507
  • 发表时间:
    2021-04-01
  • 期刊:
  • 影响因子:
    4.2
  • 作者:
    Grenier, Christy;Summerhays, Bryce;Sethuraman, Arun
  • 通讯作者:
    Sethuraman, Arun
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Arun Sethuraman其他文献

The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses
Popgen Pipeline Platform:促进群体基因组分析的软件平台
  • DOI:
    10.1101/785774
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Webb;Jared G. Knoblauch;Nitesh Sabankar;Apeksha Sukesh Kallur;J. Hey;Arun Sethuraman
  • 通讯作者:
    Arun Sethuraman
Fire and post-fire management alters soil microbial abundance and activity: A case study in semi-arid shrubland soils
火灾和火灾后管理改变土壤微生物丰度和活性:半干旱灌木丛土壤的案例研究
Coccinellid host morphology dictates morphological diversity of the parasitoid wasp <em>Dinocampus coccinellae</em>
  • DOI:
    10.1016/j.biocontrol.2019.02.015
  • 发表时间:
    2019-06-01
  • 期刊:
  • 影响因子:
  • 作者:
    Hannah Vansant;Yumary M. Vasquez;John J. Obrycki;Arun Sethuraman
  • 通讯作者:
    Arun Sethuraman
The Effects of Gene Flow from Unsampled ‘Ghost’ Populations on the Estimation of Evolutionary History under the Isolation with Migration Model
未采样“幽灵”种群的基因流对隔离迁移模型下进化历史估计的影响
  • DOI:
    10.1101/733600
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Melissa Lynch;Arun Sethuraman
  • 通讯作者:
    Arun Sethuraman
Characterizing the microbial metagenome of calcareous stromatolite formations in the San Felipe Creek in Anza Borrego Desert
安萨博雷戈沙漠圣费利佩溪钙质叠层石地层的微生物宏基因组特征
  • DOI:
    10.1101/2023.05.12.540589
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    R. Stancheva;Arun Sethuraman;Hossein Khadivar;Jenna Archambeau;Ella Caughran;A. Chang;Bradford Hunter;Christian Ihenyen;Marvin Onwukwe;Dariana Palacios;Chloe La Prairie;Nicole Read;Julianna Tsang;Brianna Vega;Cristina Velasquez;Xiaoyu Zhang;E. Becket;B. Read
  • 通讯作者:
    B. Read

Arun Sethuraman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Arun Sethuraman', 18)}}的其他基金

ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
  • 批准号:
    2203184
  • 财政年份:
    2021
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
CAREER: Developing New Computational Methods to Address the Missing Data Problem in Population Genomics
职业:开发新的计算方法来解决群体基因组学中的缺失数据问题
  • 批准号:
    2042516
  • 财政年份:
    2021
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Continuing Grant
ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
  • 批准号:
    1564659
  • 财政年份:
    2016
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
  • 批准号:
    1664918
  • 财政年份:
    2016
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant

相似海外基金

Developing and Testing Innovations: Computer Science Through Engineering Design in New York
开发和测试创新:纽约的工程设计中的计算机科学
  • 批准号:
    2341962
  • 财政年份:
    2024
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
Developing new tests and treatments to enable prevention of osteoarthritis.
开发新的测试和治疗方法以预防骨关节炎。
  • 批准号:
    MR/Y003470/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Fellowship
Developing a new method for the identification of cancer in archaeological populations
开发一种鉴定考古群体中癌症的新方法
  • 批准号:
    2341415
  • 财政年份:
    2024
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
EAGER: IMPRESS-U: Developing new approaches and structural materials to rebuild damaged Ukrainian infrastructure with environmental sustainability considerations
EAGER:IMPRESS-U:开发新方法和结构材料,在考虑环境可持续性的情况下重建受损的乌克兰基础设施
  • 批准号:
    2412196
  • 财政年份:
    2024
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
Developing a new generation of tools for predicting novel AMR mutation profiles using generative AI
使用生成人工智能开发新一代工具来预测新型 AMR 突变谱
  • 批准号:
    BB/Z514305/1
  • 财政年份:
    2024
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Research Grant
GOALI: Developing New Hydrogen Isotope Exchange Strategies for Isotope Labelling of Pharmaceuticals
目标:开发用于药物同位素标记的新​​氢同位素交换策略
  • 批准号:
    2247057
  • 财政年份:
    2023
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Standard Grant
Developing and exploring methods to understand human-nature interactions in urban areas using new forms of big data
利用新形式的大数据开发和探索理解城市地区人与自然相互作用的方法
  • 批准号:
    ES/W012979/1
  • 财政年份:
    2023
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Research Grant
Developing a new risk and needs assessment tool for young people who have displayed harmful sexual behaviour
为表现出有害性行为的年轻人开发新的风险和需求评估工具
  • 批准号:
    2886506
  • 财政年份:
    2023
  • 资助金额:
    $ 60.84万
  • 项目类别:
    Studentship
Developing new therapeutic strategies for brain metastasis
开发脑转移的新治疗策略
  • 批准号:
    10578405
  • 财政年份:
    2023
  • 资助金额:
    $ 60.84万
  • 项目类别:
Developing and evaluating new measures of family availability to provide care to people with dementia
制定和评估家庭可用性的新衡量标准,为痴呆症患者提供护理
  • 批准号:
    10728725
  • 财政年份:
    2023
  • 资助金额:
    $ 60.84万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了