CAREER: Developing New Computational Methods to Address the Missing Data Problem in Population Genomics
职业:开发新的计算方法来解决群体基因组学中的缺失数据问题
基本信息
- 批准号:2042516
- 负责人:
- 金额:$ 60.84万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-05-01 至 2021-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Population genomic data are becoming increasingly affordable and accessible, causing a sudden data explosion in the field of evolutionary biology. With this increased degree of data generation comes another important issue - the missing data problem. This missing data problem could be due to data that is either unobserved (e.g., due to the sampling method), observed incorrectly (e.g., due to errors in method of observation), or can’t be observed (e.g., due to extinction). Missing data are often not accounted for and can cause incorrect conclusions in population genomics research. This project will build bioinformatics software to address these three missing data problems. The statistical framework and methods developed by this project will be utilized extensively by evolutionary biologists in a variety of fields. Additionally, this project will develop accessible software pipelines and curricular material for recruiting and retaining underrepresented groups into computer programming and bioinformatics at a variety of levels (K-12, Undergraduate, Graduate, post-graduate).Population genomic data are either considered to be missing due to (1) sequencing or genotyping errors, (2) systematic bias in the generation of genotyping libraries (e.g. from techniques such as restriction associated DNA sequencing (RADseq), or (3) the absence of genomic data from un-sampled, perhaps extinct “ghost” populations. This project will develop a series of tools to address all three missing data problems by accounting for missing data as an unobserved variable in statistical models for the estimation of population genetic parameters and evolutionary history from genomic data. Specifically, we will (1) build a parallelized statistical framework for estimating population genetic structure from multi-allelic, multi-locus genomic data that incorporates missing data into a maximum likelihood framework, (2) systematically explore RADseq data sets – using extensive simulations and a meta-analysis of published studies to both quantify and account for how missing data due to “lost” polymorphisms at restriction sites biases estimation of evolutionary history, and (3) develop a statistical model to classify genomic loci as those having introgressed from extant or from “ghost” populations based on their coalescent histories under the Isolation with Migration (IM) model. This work will form the basis of a set of robust tools that will be utilized by evolutionary biologists in a variety of fields to systematically both assess and account for the effects of missing data in their population genomic data sets. This CAREER grant will also strengthen University-public partnerships through (1) week-long summer bioinformatics workshops for high-school biology teachers in the Philadelphia and San Diego areas, (2) development of curricular material for The Galaxy Project, the Conservation Genomics Workshop at the University of Montana, and the California State University Program for Education and Research in Biotechnology (CSUPERB), (3) recruitment and retention of underrepresented student scholars into genomics research. All curricular material, software, and pipelines developed will be shared via the PI’s GitHub page: www.github.com/arunsethuraman.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人口基因组数据变得越来越便宜和容易获得,导致进化生物学领域的数据突然爆炸。随着数据生成程度的提高,另一个重要问题也随之而来--数据缺失问题。这种缺失数据问题可能是由于未观察到的数据(例如,由于采样方法),不正确地观察(例如,由于观察方法中的误差),或者不能被观察到(例如,由于灭绝)。缺失的数据往往不占,并可能导致人口基因组学研究中的错误结论。该项目将建立生物信息学软件来解决这三个数据缺失问题。该项目开发的统计框架和方法将被进化生物学家广泛用于各个领域。此外,该项目将开发可访问的软件管道和课程材料,以招募和保留代表性不足的群体进入各级计算机编程和生物信息学(K-12,本科生,研究生,研究生)。群体基因组数据被认为是缺失的,原因是(1)测序或基因分型错误,(2)在基因分型文库的产生中的系统偏差(例如来自诸如限制性相关DNA测序(RADseq)的技术),或(3)来自未取样的、可能灭绝的“幽灵”群体的基因组数据的缺失。该项目将开发一系列工具,解决所有三个缺失数据问题,方法是将缺失数据作为统计模型中的一个未观察到的变量,从基因组数据中估计群体遗传参数和进化历史。具体而言,我们将(1)建立一个并行统计框架,用于从多等位基因、多位点基因组数据中估计群体遗传结构,该框架将缺失数据纳入最大似然框架,(2)系统地探索RADseq数据集-使用广泛的模拟和对已发表研究的荟萃分析来量化和解释由于“丢失”而导致的数据缺失限制性位点的多态性使进化历史的估计产生偏差,以及(3)开发一个统计模型,根据迁移隔离(IM)模型下的结合历史,将基因组基因座分类为从现存群体或从“幽灵”群体渗入的基因座。这项工作将形成一套强大的工具的基础,这些工具将被进化生物学家在各个领域用来系统地评估和解释其人口基因组数据集中缺失数据的影响。这项职业资助还将通过以下方式加强大学与公众的合作关系:(1)为费城和圣地亚哥地区的高中生物学教师举办为期一周的暑期生物信息学研讨会,(2)为银河项目、蒙大拿大学的保护基因组学研讨会和加州州立大学生物技术教育和研究计划(CSUPERB)开发课程材料,(3)招募和保留代表性不足的学生学者从事基因组学研究。所有开发的课程材料,软件和管道将通过PI的GitHub页面共享:www.github.com/arunsethuraman.This奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Arun Sethuraman其他文献
The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses
Popgen Pipeline Platform:促进群体基因组分析的软件平台
- DOI:
10.1101/785774 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
A. Webb;Jared G. Knoblauch;Nitesh Sabankar;Apeksha Sukesh Kallur;J. Hey;Arun Sethuraman - 通讯作者:
Arun Sethuraman
Fire and post-fire management alters soil microbial abundance and activity: A case study in semi-arid shrubland soils
火灾和火灾后管理改变土壤微生物丰度和活性:半干旱灌木丛土壤的案例研究
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
G. Vourlitis;Dylan Steinecke;Tanairi Martinez;Karen Konda;Roxana Rendon;Victoria Hall;Sherryca Khor;Arun Sethuraman - 通讯作者:
Arun Sethuraman
Coccinellid host morphology dictates morphological diversity of the parasitoid wasp <em>Dinocampus coccinellae</em>
- DOI:
10.1016/j.biocontrol.2019.02.015 - 发表时间:
2019-06-01 - 期刊:
- 影响因子:
- 作者:
Hannah Vansant;Yumary M. Vasquez;John J. Obrycki;Arun Sethuraman - 通讯作者:
Arun Sethuraman
The Effects of Gene Flow from Unsampled ‘Ghost’ Populations on the Estimation of Evolutionary History under the Isolation with Migration Model
未采样“幽灵”种群的基因流对隔离迁移模型下进化历史估计的影响
- DOI:
10.1101/733600 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Melissa Lynch;Arun Sethuraman - 通讯作者:
Arun Sethuraman
Characterizing the microbial metagenome of calcareous stromatolite formations in the San Felipe Creek in Anza Borrego Desert
安萨博雷戈沙漠圣费利佩溪钙质叠层石地层的微生物宏基因组特征
- DOI:
10.1101/2023.05.12.540589 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
R. Stancheva;Arun Sethuraman;Hossein Khadivar;Jenna Archambeau;Ella Caughran;A. Chang;Bradford Hunter;Christian Ihenyen;Marvin Onwukwe;Dariana Palacios;Chloe La Prairie;Nicole Read;Julianna Tsang;Brianna Vega;Cristina Velasquez;Xiaoyu Zhang;E. Becket;B. Read - 通讯作者:
B. Read
Arun Sethuraman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Arun Sethuraman', 18)}}的其他基金
ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
- 批准号:
2203184 - 财政年份:2021
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
CAREER: Developing New Computational Methods to Address the Missing Data Problem in Population Genomics
职业:开发新的计算方法来解决群体基因组学中的缺失数据问题
- 批准号:
2147812 - 财政年份:2021
- 资助金额:
$ 60.84万 - 项目类别:
Continuing Grant
ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
- 批准号:
1564659 - 财政年份:2016
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
ABI Development: Improved Tools for Population Genomics
ABI 开发:改进的群体基因组学工具
- 批准号:
1664918 - 财政年份:2016
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
相似海外基金
Developing and Testing Innovations: Computer Science Through Engineering Design in New York
开发和测试创新:纽约的工程设计中的计算机科学
- 批准号:
2341962 - 财政年份:2024
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
Developing new tests and treatments to enable prevention of osteoarthritis.
开发新的测试和治疗方法以预防骨关节炎。
- 批准号:
MR/Y003470/1 - 财政年份:2024
- 资助金额:
$ 60.84万 - 项目类别:
Fellowship
Developing a new method for the identification of cancer in archaeological populations
开发一种鉴定考古群体中癌症的新方法
- 批准号:
2341415 - 财政年份:2024
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
EAGER: IMPRESS-U: Developing new approaches and structural materials to rebuild damaged Ukrainian infrastructure with environmental sustainability considerations
EAGER:IMPRESS-U:开发新方法和结构材料,在考虑环境可持续性的情况下重建受损的乌克兰基础设施
- 批准号:
2412196 - 财政年份:2024
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
Developing a new generation of tools for predicting novel AMR mutation profiles using generative AI
使用生成人工智能开发新一代工具来预测新型 AMR 突变谱
- 批准号:
BB/Z514305/1 - 财政年份:2024
- 资助金额:
$ 60.84万 - 项目类别:
Research Grant
GOALI: Developing New Hydrogen Isotope Exchange Strategies for Isotope Labelling of Pharmaceuticals
目标:开发用于药物同位素标记的新氢同位素交换策略
- 批准号:
2247057 - 财政年份:2023
- 资助金额:
$ 60.84万 - 项目类别:
Standard Grant
Developing and exploring methods to understand human-nature interactions in urban areas using new forms of big data
利用新形式的大数据开发和探索理解城市地区人与自然相互作用的方法
- 批准号:
ES/W012979/1 - 财政年份:2023
- 资助金额:
$ 60.84万 - 项目类别:
Research Grant
Developing a new risk and needs assessment tool for young people who have displayed harmful sexual behaviour
为表现出有害性行为的年轻人开发新的风险和需求评估工具
- 批准号:
2886506 - 财政年份:2023
- 资助金额:
$ 60.84万 - 项目类别:
Studentship
Developing new therapeutic strategies for brain metastasis
开发脑转移的新治疗策略
- 批准号:
10578405 - 财政年份:2023
- 资助金额:
$ 60.84万 - 项目类别:
Developing and evaluating new measures of family availability to provide care to people with dementia
制定和评估家庭可用性的新衡量标准,为痴呆症患者提供护理
- 批准号:
10728725 - 财政年份:2023
- 资助金额:
$ 60.84万 - 项目类别: