Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
基本信息
- 批准号:RGPIN-2019-06604
- 负责人:
- 金额:$ 2.99万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The field of bioinformatics has borrowed from other fields of computer science and mathematics--such as statistics, machine learning, probabilistic modelling, and optimization--to develop sound, general algorithms for analyzing high-throughput genetic and molecular data. However, almost without exception, those algorithms treat every "entity" under consideration the same. For example, to identify which genes are differentially expressed between two conditions, the same statistical model is applied individually to every gene. When we want to identify regions of the genomic DNA bound by a certain protein, the same statistical model is applied to every genomic locus. Of course the observed data for each gene or each locus, provided by the high-throughput assay, is different. But the test applied is the same, and for a simple reason: traditionally, bioinformatics has dealt with situations where the number of observations per entity (e.g. two conditions, a handful of time points, or a few tens of patients) is vastly outnumbered by the number of entities (e.g. tens of thousands of genes or millions of genomic loci). Anything but simple statistical models would be in danger of overfitting the sparse data available. However, the accumulation of massive public databases of personal genomes, epigenomes, and cell-, tissue-, and disease-specific expression profiles, means that we now have at our disposal high-throughput data from tens or hundreds of thousands of "conditions". Moreover, statistical analyses of such data reveals a startling fact: all genes and all genomic loci are not alike. For example, the expression of some genes is inherently more variable than others. Furthermore, our measurements of some genes are noisier and/or more systematically biased than for other genes. Similarly for genomic loci, where we have varying signal-to-noise ratios in different assays, and different sources and amounts of measurement bias. The central idea behind this proposal is to use that mass of already-collected data to build and test more sophisticated, machine learning-based models of every single gene or locus in the genome. Further, we can use those models not just for the sake of analyzing that same data, but rather for creating tools to analyze new datasets, whatever their size. By modelling the particular biases and variability of each gene or locus, we can get a more accurate measure of the novelty of new measurements, and more successfully identify truly significant alterations in gene and genome behaviour.
生物信息学领域借鉴了计算机科学和数学的其他领域,如统计学,机器学习,概率建模和优化,以开发用于分析高通量遗传和分子数据的可靠的通用算法。然而,几乎无一例外,这些算法对待每一个“实体”的考虑相同。例如,为了鉴定哪些基因在两种条件之间差异表达,将相同的统计模型单独应用于每个基因。当我们想要识别与某种蛋白质结合的基因组DNA区域时,相同的统计模型适用于每个基因组位点。当然,由高通量测定提供的每个基因或每个基因座的观察数据是不同的。但应用的测试是相同的,原因很简单:传统上,生物信息学处理的是每个实体(例如两个条件,少数时间点或几十个患者)的观察数量远远超过实体数量(例如数万个基因或数百万个基因组位点)的情况。除了简单的统计模型,任何东西都有过度拟合稀疏数据的危险。然而,个人基因组、表观基因组以及细胞、组织和疾病特异性表达谱的大量公共数据库的积累意味着我们现在可以处理来自数万或数十万种“状况”的高通量数据。此外,对这些数据的统计分析揭示了一个惊人的事实:所有基因和所有基因组位点都不一样。例如,某些基因的表达本质上比其他基因更易变。此外,我们对某些基因的测量比对其他基因的测量更具噪音和/或更系统性的偏见。类似地,对于基因组基因座,我们在不同的测定中具有不同的信噪比,以及不同的测量偏差的来源和量。这一提议背后的核心思想是利用大量已经收集的数据来构建和测试更复杂的、基于机器学习的基因组中每个基因或位点的模型。此外,我们可以使用这些模型不仅仅是为了分析相同的数据,而是为了创建工具来分析新的数据集,无论它们的大小。通过对每个基因或基因座的特定偏差和变异性进行建模,我们可以更准确地衡量新测量的新奇,并更成功地识别基因和基因组行为中真正显着的改变。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Perkins, Theodore其他文献
Human gene expression variability and its dependence on methylation and aging
- DOI:
10.1186/s12864-019-6308-7 - 发表时间:
2019-12-07 - 期刊:
- 影响因子:4.4
- 作者:
Bashkeel, Nasser;Perkins, Theodore;Lee, Jonathan - 通讯作者:
Lee, Jonathan
Perkins, Theodore的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Perkins, Theodore', 18)}}的其他基金
Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
- 批准号:
RGPIN-2019-06604 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
- 批准号:
RGPIN-2019-06604 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
- 批准号:
RGPIN-2019-06604 - 财政年份:2019
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Inference and Scaling in Stochastic Dynamical Systems
随机动力系统中的推理和缩放
- 批准号:
RGPIN-2014-05716 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Inference and Scaling in Stochastic Dynamical Systems
随机动力系统中的推理和缩放
- 批准号:
RGPIN-2014-05716 - 财政年份:2017
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Inference and Scaling in Stochastic Dynamical Systems
随机动力系统中的推理和缩放
- 批准号:
RGPIN-2014-05716 - 财政年份:2016
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Inference and Scaling in Stochastic Dynamical Systems
随机动力系统中的推理和缩放
- 批准号:
RGPIN-2014-05716 - 财政年份:2015
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Inference and Scaling in Stochastic Dynamical Systems
随机动力系统中的推理和缩放
- 批准号:
RGPIN-2014-05716 - 财政年份:2014
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Systems biology and biological information processing
系统生物学与生物信息处理
- 批准号:
328154-2009 - 财政年份:2013
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Systems biology and biological information processing
系统生物学与生物信息处理
- 批准号:
328154-2009 - 财政年份:2012
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Graphon mean field games with partial observation and application to failure detection in distributed systems
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于深穿透拉曼光谱的安全光照剂量的深层病灶无创检测与深度预测
- 批准号:82372016
- 批准年份:2023
- 资助金额:48.00 万元
- 项目类别:面上项目
膀胱癌高表达基因UPK3A的筛选、鉴定和相关研究
- 批准号:81101922
- 批准年份:2011
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
基于隐半马尔科夫模型的无线传感器网络入侵检测系统研究
- 批准号:61101083
- 批准年份:2011
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
图像分类方法研究及其在色情监测中的应用
- 批准号:61172103
- 批准年份:2011
- 资助金额:62.0 万元
- 项目类别:面上项目
基于指令层次的网页木马渗透攻击机理分析与检测方法研究
- 批准号:61003217
- 批准年份:2010
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
超高速正则表达式匹配技术研究
- 批准号:61073184
- 批准年份:2010
- 资助金额:12.0 万元
- 项目类别:面上项目
低辐射空间环境下商用多核处理器层次化软件容错技术研究
- 批准号:90818016
- 批准年份:2008
- 资助金额:50.0 万元
- 项目类别:重大研究计划
制冷系统故障诊断关键问题的定量研究
- 批准号:50876059
- 批准年份:2008
- 资助金额:30.0 万元
- 项目类别:面上项目
相似海外基金
Improving inferences on health effects of chemical exposures
改进对化学品暴露对健康影响的推断
- 批准号:
10753010 - 财政年份:2023
- 资助金额:
$ 2.99万 - 项目类别:
Improving Fragment Based Drug Discovery and the Development of Tools for Chemical Biology through Nanoscale Encapsulation and NMR Spectroscopy
通过纳米级封装和核磁共振波谱改善基于片段的药物发现和化学生物学工具的开发
- 批准号:
10419416 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Improving Outcomes in Cancer Treatment-Related Cardiotoxicity
改善癌症治疗相关心脏毒性的结果
- 批准号:
10544975 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Improving Outcomes in Cancer Treatment-Related Cardiotoxicity
改善癌症治疗相关心脏毒性的结果
- 批准号:
10693265 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Improving Fragment Based Drug Discovery and the Development of Tools for Chemical Biology through Nanoscale Encapsulation and NMR Spectroscopy
通过纳米级封装和核磁共振波谱改善基于片段的药物发现和化学生物学工具的开发
- 批准号:
10707914 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
- 批准号:
RGPIN-2019-06604 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Improving the throughput of diagnosis and treatment of inherited diseases of the retina
提高视网膜遗传性疾病的诊断和治疗效率
- 批准号:
10629340 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Improving detection in high-throughput sequencing data with gene/locus-specific models
使用基因/位点特异性模型改进高通量测序数据的检测
- 批准号:
RGPIN-2019-06604 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Improving the throughput of diagnosis and treatment of inherited diseases of the retina
提高视网膜遗传性疾病的诊断和治疗效率
- 批准号:
10228094 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Improving the throughput of diagnosis and treatment of inherited diseases of the retina
提高视网膜遗传性疾病的诊断和治疗效率
- 批准号:
10408112 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别: