Effective prediction of microRNAs in the face of class imbalance
面对类别不平衡时有效预测 microRNA
基本信息
- 批准号:RGPIN-2016-06179
- 负责人:
- 金额:$ 1.6万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
MicroRNA (miRNA) are short expressed genomic sequences which encode small RNA molecules that adopt a “hairpin” secondary structure. Computational prediction of miRNA is important, since miRNA are now believed to disrupt or otherwise control the expression of 60-90% of mammalian genes. Sequence-based de novo prediction of miRNA is made difficult due to the acute class imbalance: for each true miRNA within a genome, we expect 1000 pseudo-miRNA (i.e. genomic regions producing miRNA-like hairpin structures). Therefore, effective miRNA prediction systems must have extremely high specificity (i.e. the ability to reject pseudo-miRNA), while also retaining the ability to correctly detect true miRNA (i.e. recall).
We have recently introduced the Species-specific miRNA Prediction (SMIRP) framework for training highly effective species-specific miRNA prediction systems. When applied to three popular miRNA prediction methods, we observe significant improvements in precision (i.e. the proportion of predictions expected to be true miRNA) while maintaining the same high recall rates observed by the original methods. We propose to extend our research in three key areas:
1) Existing miRNA prediction methods perform well on canonical pre-miRNA, but are not well-suited for high-throughput annotation of entire genomes. Therefore, new classification techniques will be developed which optimally differentiate between real and pseudo-miRNA sequences within predicted hairpin structures. This will include the development of novel methods to compute general-purpose information-rich DNA/RNA descriptors. In addition to miRNA prediction, these descriptors will benefit other nucleic acid classification problem domains.
2) With the increasing availability of transcriptomic data, there is a need and an opportunity to develop an integrated miRNA discovery pipeline that leverages both next-generation sequencing (NGS) read patterns and powerful sequence-based methods such as SMIRP. We will develop and apply advanced machine learning approaches to optimally combine NGS- and sequence-based approaches, improving our ability to discover novel miRNA of potential importance to human health.
3) Contributions will also be made in the broader field of machine learning in the presence of extreme class imbalance where many classic performance metrics, such as ROC curves, become inappropriate as they do not adequately reflect the impact of false positive predictions. To address this and other issues, we will develop novel performance metrics for cases of acute class imbalance. While these new metrics will find immediate application in the development of miRNA prediction tools, they will also be widely applicable to other problem domains within bioinformatics and beyond.
microRNA(miRNA)是短表达的基因组序列,该序列编码采用“发夹”二次结构的小RNA分子。 miRNA的计算预测很重要,因为现在据信miRNA会破坏或以其他方式控制60-90%的哺乳动物基因的表达。由于急性类失衡,基于序列的miRNA从头预测很难:对于基因组中的每个真实miRNA,我们期望1000个伪miRNA(即产生miRNA样头蛋白结构的基因组区域)。因此,有效的miRNA预测系统必须具有极高的特异性(即拒绝伪miRNA的能力),同时还保留了正确检测真正miRNA的能力(即回忆)。
最近,我们引入了训练高效物种特异性miRNA预测系统的物种特异性miRNA预测(SMIRP)框架。当应用于三种流行的miRNA预测方法时,我们会观察到精确度的显着改善(即,预期为真正miRNA的预测比例),同时保持原始方法观察到的相同的高召回率。我们建议在三个关键领域扩展我们的研究:
1)现有的miRNA预测方法在典型的前MIRNA上表现良好,但不适合整个基因组的高通量注释。因此,将开发新的分类技术,从而最佳地区分预测的发夹结构中的真实和伪miRNA序列。这将包括开发新的方法来计算通用信息富含DNA/RNA描述符。除了miRNA预测外,这些描述符还将使其他核酸分类问题域受益。
2)随着转录组数据的可用性的增加,有必要和机会开发集成的miRNA发现管道,该管道利用下一代测序(NGS)读取模式和强大的基于序列的方法(例如Smirp)。我们将开发和应用先进的机器学习方法来最佳地结合基于NGS和序列的方法,从而提高我们发现对人类健康潜在重要性的新型miRNA的能力。
3)在极端阶级不平衡的情况下,在更广泛的机器学习领域也将做出贡献,其中许多经典的性能指标(例如ROC曲线)变得不合适,因为它们不能充分反映假阳性预测的影响。为了解决这个问题和其他问题,我们将针对急性阶级失衡的情况开发新颖的绩效指标。尽管这些新指标将立即在MiRNA预测工具的开发中找到立即应用,但它们也将广泛适用于生物信息学及以后的其他问题领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Green, James其他文献
Citations and science
- DOI:
10.1007/s11096-017-0539-y - 发表时间:
2017-10-01 - 期刊:
- 影响因子:2.4
- 作者:
van Mil, J. W. Foppe;Green, James - 通讯作者:
Green, James
Internet use in an orthopaedic outpatient population
- DOI:
10.1097/bco.0b013e31828e542b - 发表时间:
2013-05-01 - 期刊:
- 影响因子:0.3
- 作者:
Baker, Joseph F.;Green, James;Mulhall, Kevin J. - 通讯作者:
Mulhall, Kevin J.
Critical Role of the Virus-Encoded MicroRNA-155 Ortholog in the Induction of Marek's Disease Lymphomas
- DOI:
10.1371/journal.ppat.1001305.s001 - 发表时间:
2011-01-01 - 期刊:
- 影响因子:0
- 作者:
Green, James;Petherbridge, Lawrence;Kgosana, Lydia - 通讯作者:
Kgosana, Lydia
Child pedestrian casualties and deprivation
- DOI:
10.1016/j.aap.2010.10.016 - 发表时间:
2011-05-01 - 期刊:
- 影响因子:5.9
- 作者:
Green, James;Muir, Helen;Maher, Mike - 通讯作者:
Maher, Mike
Quality and Variability of Patient Directions in Electronic Prescriptions in the Ambulatory Care Setting.
- DOI:
10.18553/jmcp.2018.17404 - 发表时间:
2018-07 - 期刊:
- 影响因子:2.1
- 作者:
Yang, Yuze;Ward-Charlerie, Stacy;Dhavle, Ajit A.;Rupp, Michael T.;Green, James - 通讯作者:
Green, James
Green, James的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Green, James', 18)}}的其他基金
Reciprocal Perspective Machine Learning to Identify Relationships in Sparse Biological Networks
交互视角机器学习识别稀疏生物网络中的关系
- 批准号:
RGPIN-2021-04184 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2022-04761 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Reciprocal Perspective Machine Learning to Identify Relationships in Sparse Biological Networks
交互视角机器学习识别稀疏生物网络中的关系
- 批准号:
RGPIN-2021-04184 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2021
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2020
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2020
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Effective prediction of microRNAs in the face of class imbalance
面对类别不平衡时有效预测 microRNA
- 批准号:
RGPIN-2016-06179 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
Unobtrusive neonatal patient monitoring using video and pressure data
使用视频和压力数据进行不引人注目的新生儿患者监测
- 批准号:
543940-2019 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Collaborative Research and Development Grants
Metal Mediated and Catalyzed Organic Synthetic Methods
金属介导和催化的有机合成方法
- 批准号:
RGPIN-2016-04946 - 财政年份:2019
- 资助金额:
$ 1.6万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
基于深穿透拉曼光谱的安全光照剂量的深层病灶无创检测与深度预测
- 批准号:82372016
- 批准年份:2023
- 资助金额:48.00 万元
- 项目类别:面上项目
强子对撞机上一对希格斯粒子产生和衰变过程的精确理论预言
- 批准号:12375076
- 批准年份:2023
- 资助金额:52.00 万元
- 项目类别:面上项目
强子三维结构分布函数的理论预言
- 批准号:12375080
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
基于机器学习方法设计和预言多组元合金体系材料的研究
- 批准号:92270104
- 批准年份:2022
- 资助金额:75.00 万元
- 项目类别:重大研究计划
次次领头阶anti-kT以及同类型的喷注函数和pA对撞中喷注向前产生的预言
- 批准号:12175016
- 批准年份:2021
- 资助金额:63 万元
- 项目类别:面上项目
相似海外基金
The cardiovascular consequences of sleep apnea plus COPD (Overlap syndrome)
睡眠呼吸暂停加慢性阻塞性肺病(重叠综合征)对心血管的影响
- 批准号:
10733384 - 财政年份:2023
- 资助金额:
$ 1.6万 - 项目类别:
Identifying multimodal biomarkers for autologous serum tears in the treatment of chronic postoperative ocular pain
识别治疗慢性术后眼痛的自体血清泪液的多模式生物标志物
- 批准号:
10794761 - 财政年份:2023
- 资助金额:
$ 1.6万 - 项目类别:
Digital Multiplexed Analysis of Circulating Nucleic Acids in Small-Volume Blood Specimens
小体积血液样本中循环核酸的数字多重分析
- 批准号:
10467839 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别:
Lung Cancer Early Detection and Immunotherapy Response Prediction and Monitoring with an Exo-PROS Liquid Biopsy Assay
使用 Exo-PROS 液体活检测定进行肺癌早期检测和免疫治疗反应预测和监测
- 批准号:
10665754 - 财政年份:2022
- 资助金额:
$ 1.6万 - 项目类别: