Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
基本信息
- 批准号:RGPIN-2018-05147
- 负责人:
- 金额:$ 2.26万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Understanding the genetic basis of phenotypic changes and predicting phenotypes based on genotypes are long-standing goals of the field of genetics. Today, scalable tools, i.e., software that allows rapid analysis of very large data using limited memory in a personal computer, is an emerging requirement in the era of genomic big-data. The long-term goal of my research program is to develop novel statistical models and their scalable implementations to facilitate genotype-phenotype mappings and predictions. ******Background. Recent advances in high-throughput sequencing technologies including whole-exome sequencing, RNA-Seq (for transcriptome) and Bisulfite-Seq (for methylome) have created an excitement in genetics-related areas. However, there is a lack of tools that allow the seamless integration of multi-scale omics datasets for the precise prediction of phenotypes in a biologically relevant and meaningful context. In particular, gene-gene interactions have not been fully characterized and utilized in predictors. Moreover, in the coming big-data era, there is a lack of scalable tools that permit the effective statistical analysis of large datasets without requiring a machine with very large memory.******Objectives. Building upon my previous work of identifying gene-gene interactions and implementing scalable software, I will focus on three short-term objectives: (1) Identify gene interactions using genotype-phenotype data by integrating multi-scale omics data; Bayesian Network and Frequent Itemset Mining will be integrated to achieve this goal. (2) Build a polygenic phenotype predictor that integrates gene interactions; an extension of Group LASSO will be implemented for this. (3) Create scalable software implementing the aforementioned statistical models using disk-based solutions, i.e., memory virtualization and huge-page techniques in computer science. It will store large data on disk while allowing rapid calculation as if the data resided in main memory. The multi-scale omics data from the 1,001 Arabidopsis Genomes Project and the multi-scale omics plant data generated in Alberta will be used. ******Impact. The proposed research will not only provide a new theoretical framework of statistical genetics, but will also furnish novel computational tools to assist experimental scientists carrying out gene mapping projects. Moreover, it will enable practitioners in agriculture and health to improve predictions of phenotype, benefiting Canadian food productions and the health system. Practically, my software represents a scalable solution for big-data analyses with minimum memory usage. This will be particularly relevant to many Canadian research groups that do not have immediate access to high-performance computing facilities. This program will train HQP to carry out bioinformatics and biostatistics analyses to fully utilize the future genomic big-data.
了解表型变化的遗传基础和基于基因型预测表型是遗传学领域的长期目标。今天,可扩展的工具,即,允许使用个人计算机中有限的存储器快速分析非常大的数据的软件是基因组大数据时代的新兴需求。我的研究计划的长期目标是开发新的统计模型及其可扩展的实现,以促进基因型-表型映射和预测。** 背景。高通量测序技术的最新进展,包括全外显子组测序,RNA-Seq(用于转录组)和亚硫酸氢盐-Seq(用于甲基化组),在遗传学相关领域产生了令人兴奋的结果。然而,缺乏允许多尺度组学数据集无缝集成的工具,用于在生物学相关和有意义的背景下精确预测表型。特别是,基因-基因相互作用还没有得到充分的表征和利用的预测。此外,在即将到来的大数据时代,缺乏可扩展的工具来允许对大型数据集进行有效的统计分析,而不需要具有非常大内存的机器。*目标.基于我以前的工作,确定基因-基因相互作用和实现可扩展的软件,我将专注于三个短期目标:(1)通过整合多尺度组学数据,利用基因型-表型数据识别基因相互作用;贝叶斯网络和频繁项目集挖掘将被集成来实现这一目标。(2)建立一个多基因表型预测,整合基因相互作用;组LASSO的扩展将实现这一点。(3)使用基于磁盘的解决方案创建实施上述统计模型的可扩展软件,即,内存虚拟化和计算机科学中的大页面技术。它将在磁盘上存储大量数据,同时允许快速计算,就像数据驻留在主存中一样。将使用来自1,001拟南芥基因组计划的多尺度组学数据和在阿尔伯塔产生的多尺度组学植物数据。 * 影响。这项研究不仅将为统计遗传学提供一个新的理论框架,而且还将提供新的计算工具,以帮助实验科学家进行基因作图项目。此外,它将使农业和卫生从业人员能够改善对表型的预测,使加拿大的食品生产和卫生系统受益。实际上,我的软件代表了一个可扩展的解决方案,用于大数据分析,使用最少的内存。这将特别关系到许多加拿大研究小组,他们不能立即获得高性能计算设施。该计划将培养HQP进行生物信息学和生物统计学分析,以充分利用未来的基因组大数据。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhang, Qingrun其他文献
Stabilized COre gene and Pathway Election uncovers pan-cancer shared pathways and a cancer-specific driver.
- DOI:
10.1126/sciadv.abo2846 - 发表时间:
2022-12-21 - 期刊:
- 影响因子:13.6
- 作者:
Kossinna, Pathum;Cai, Weijia;Lu, Xuewen;Shemanko, Carrie S.;Zhang, Qingrun - 通讯作者:
Zhang, Qingrun
JAWAMix5: an out-of-core HDF5-based java implementation of whole-genome association studies using mixed models
- DOI:
10.1093/bioinformatics/btt122 - 发表时间:
2013-05-01 - 期刊:
- 影响因子:5.8
- 作者:
Long, Quan;Zhang, Qingrun;Nordborg, Magnus - 通讯作者:
Nordborg, Magnus
Universal primers for HBV genome DNA amplification across subtypes: a case study for designing more effective viral primers (Retracted article. See vol 4, pg Nil_1, 2007)
- DOI:
10.1186/1743-422x-4-92 - 发表时间:
2007-09-24 - 期刊:
- 影响因子:4.8
- 作者:
Zhang, Qingrun;Wu, Guanghua;Zeng, Changqing - 通讯作者:
Zeng, Changqing
Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden.
- DOI:
10.1038/ng.2678 - 发表时间:
2013-08 - 期刊:
- 影响因子:30.8
- 作者:
Long, Quan;Rabanal, Fernando A.;Meng, Dazhe;Huber, Christian D.;Farlow, Ashley;Platzer, Alexander;Zhang, Qingrun;Vilhjalmsson, Bjarni J.;Korte, Arthur;Nizhynska, Viktoria;Voronin, Viktor;Korte, Pamela;Sedman, Laura;Mandakova, Terezie;Lysak, Martin A.;Seren, Uemit;Hellmann, Ines;Nordborg, Magnus - 通讯作者:
Nordborg, Magnus
A second generation human haplotype map of over 3.1 million SNPs.
- DOI:
10.1038/nature06258 - 发表时间:
2007-10-18 - 期刊:
- 影响因子:64.8
- 作者:
Frazer, Kelly A.;Ballinger, Dennis G.;Cox, David R.;Hinds, David A.;Stuve, Laura L.;Gibbs, Richard A.;Belmont, John W.;Boudreau, Andrew;Hardenbol, Paul;Leal, Suzanne M.;Pasternak, Shiran;Wheeler, David A.;Willis, Thomas D.;Yu, Fuli;Yang, Huanming;Zeng, Changqing;Gao, Yang;Hu, Haoran;Hu, Weitao;Li, Chaohua;Lin, Wei;Liu, Siqi;Pan, Hao;Tang, Xiaoli;Wang, Jian;Wang, Wei;Yu, Jun;Zhang, Bo;Zhang, Qingrun;Zhao, Hongbin;Zhao, Hui;Zhou, Jun;Gabriel, Stacey B.;Barry, Rachel;Blumenstiel, Brendan;Camargo, Amy;Defelice, Matthew;Faggart, Maura;Goyette, Mary;Gupta, Supriya;Moore, Jamie;Nguyen, Huy;Onofrio, Robert C.;Parkin, Melissa;Roy, Jessica;Stahl, Erich;Winchester, Ellen;Ziaugra, Liuda;Altshuler, David;Shen, Yan;Yao, Zhijian;Huang, Wei;Chu, Xun;He, Yungang;Jin, Li;Liu, Yangfan;Shen, Yayun;Sun, Weiwei;Wang, Haifeng;Wang, Yi;Wang, Ying;Xiong, Xiaoyan;Xu, Liang;Waye, Mary M. Y.;Tsui, Stephen K. W.;Wong, J. Tze-Fei;Galver, Luana M.;Fan, Jian-Bing;Gunderson, Kevin;Murray, Sarah S.;Oliphant, Arnold R.;Chee, Mark S.;Montpetit, Alexandre;Chagnon, Fanny;Ferretti, Vincent;Leboeuf, Martin;Olivier, Jean-Franccois;Phillips, Michael S.;Roumy, Stephanie;Sallee, Clementine;Verner, Andrei;Hudson, Thomas J.;Kwok, Pui-Yan;Cai, Dongmei;Koboldt, Daniel C.;Miller, Raymond D.;Pawlikowska, Ludmila;Taillon-Miller, Patricia;Xiao, Ming;Tsui, Lap-Chee;Mak, William;Song, You Qiang;Tam, Paul K. H.;Nakamura, Yusuke;Kawaguchi, Takahisa;Kitamoto, Takuya;Morizono, Takashi;Nagashima, Atsushi;Ohnishi, Yozo;Sekine, Akihiro;Tanaka, Toshihiro;Tsunoda, Tatsuhiko;Deloukas, Panos;Bird, Christine P.;Delgado, Marcos;Dermitzakis, Emmanouil T.;Gwilliam, Rhian;Hunt, Sarah;Morrison, Jonathan;Powell, Don;Stranger, Barbara E.;Whittaker, Pamela;Bentley, David R.;Daly, Mark J.;de Bakker, Paul I. W.;Barrett, Jeff;Chretien, Yves R.;Maller, Julian;McCarroll, Steve;Patterson, Nick;Pe'er, Itsik;Price, Alkes;Purcell, Shaun;Richter, Daniel J.;Sabeti, Pardis;Saxena, Richa;Schaffner, Stephen F.;Sham, Pak C.;Varilly, Patrick;Altshuler, David;Stein, Lincoln D.;Krishnan, Lalitha;Smith, Albert Vernon;Tello-Ruiz, Marcela K.;Thorisson, Gudmundur A.;Chakravarti, Aravinda;Chen, Peter E.;Cutler, David J.;Kashuk, Carl S.;Lin, Shin;Abecasis, Goncalo R.;Guan, Weihua;Li, Yun;Munro, Heather M.;Qin, Zhaohui Steve;Thomas, Daryl J.;McVean, Gilean;Auton, Adam;Bottolo, Leonardo;Cardin, Niall;Eyheramendy, Susana;Freeman, Colin;Marchini, Jonathan;Myers, Simon;Spencer, Chris;Stephens, Matthew;Donnelly, Peter;Cardon, Lon R.;Clarke, Geraldine;Evans, David M.;Morris, Andrew P.;Weir, Bruce S.;Tsunoda, Tatsuhiko;Johnson, Todd A.;Mullikin, James C.;Sherry, Stephen T.;Feolo, Michael;Skol, Andrew - 通讯作者:
Skol, Andrew
Zhang, Qingrun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhang, Qingrun', 18)}}的其他基金
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
- 批准号:
RGPIN-2018-05147 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
- 批准号:
RGPIN-2018-05147 - 财政年份:2021
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
A GPU Server for Integration of Machine Learning in Mathematics and Statistics Research and Training
用于将机器学习集成到数学和统计研究与培训中的 GPU 服务器
- 批准号:
RTI-2021-00675 - 财政年份:2020
- 资助金额:
$ 2.26万 - 项目类别:
Research Tools and Instruments
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
- 批准号:
RGPIN-2018-05147 - 财政年份:2020
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
- 批准号:
RGPIN-2018-05147 - 财政年份:2018
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Statistical models and computational tools for gene-gene interaction analyses by utilizing multi-scale omics
利用多尺度组学进行基因间相互作用分析的统计模型和计算工具
- 批准号:
DGECR-2018-00061 - 财政年份:2018
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Launch Supplement
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
河北南部地区灰霾的来源和形成机制研究
- 批准号:41105105
- 批准年份:2011
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
保险风险模型、投资组合及相关课题研究
- 批准号:10971157
- 批准年份:2009
- 资助金额:24.0 万元
- 项目类别:面上项目
RKTG对ERK信号通路的调控和肿瘤生成的影响
- 批准号:30830037
- 批准年份:2008
- 资助金额:190.0 万元
- 项目类别:重点项目
新型手性NAD(P)H Models合成及生化模拟
- 批准号:20472090
- 批准年份:2004
- 资助金额:23.0 万元
- 项目类别:面上项目
相似海外基金
SCH: Novel and Interpretable Statistical Learning for Brain Images in AD/ADRDs
SCH:针对 AD/ADRD 大脑图像的新颖且可解释的统计学习
- 批准号:
10816764 - 财政年份:2023
- 资助金额:
$ 2.26万 - 项目类别:
Statistical methods to characterize causal mechanisms by which air pollution affects the recurrence of cardiovascular events
描述空气污染影响心血管事件复发因果机制的统计方法
- 批准号:
10660281 - 财政年份:2023
- 资助金额:
$ 2.26万 - 项目类别:
Statistical methods for population-level cell-type-specific analyses of tissue omics data for Alzheimer's disease
阿尔茨海默病组织组学数据的群体水平细胞类型特异性分析的统计方法
- 批准号:
10589254 - 财政年份:2023
- 资助金额:
$ 2.26万 - 项目类别:
Statistical models for the integrative analysis of complex biomedical images with manifold structure
具有流形结构的复杂生物医学图像综合分析的统计模型
- 批准号:
10590469 - 财政年份:2023
- 资助金额:
$ 2.26万 - 项目类别:
Statistical methods for longitudinal integrated mechanistic modeling of multiview data
多视图数据纵向综合机制建模的统计方法
- 批准号:
10445698 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
CORE 1/2: INIA Stress and Chronic Alcohol Interactions: Computational and Statistical Analysis Core (CSAC)
CORE 1/2:INIA 压力和慢性酒精相互作用:计算和统计分析核心 (CSAC)
- 批准号:
10411629 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Statistical methods for longitudinal integrated mechanistic modeling of multiview data
多视图数据纵向综合机制建模的统计方法
- 批准号:
10685565 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Core B: Statistical and Computational Analysis Core
核心B:统计和计算分析核心
- 批准号:
10698077 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Efficient Statistical and Computational Methods for Genetics and Dynamical Models
遗传学和动力学模型的高效统计和计算方法
- 批准号:
RGPIN-2019-06131 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Statistical and Computational Optimisation In Pricing Models (Insurance)
定价模型中的统计和计算优化(保险)
- 批准号:
2669153 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Studentship