权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interpretable Machine Learning Approaches Applied to Omics Datasets

应用于组学数据集的可解释机器学习方法

基本信息

批准号：
RGPIN-2022-04262
负责人：
Hussin, Julie
金额：
$ 2.84万
依托单位：
Université de Montréal
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=760589
关键词：
Interpretable Machine Learning Approaches Applied

项目摘要

In recent decades, advances in sequencing technologies have set off a revolution resulting in an explosion of genetic data, propelling human genomics into the era of big data. Even more recently, novel biotechnologies allow us to obtain information at the molecular level for each individual, such as the concentrations of metabolites (small molecules such as antioxidants, vitamins) or the quantification of RNA transcripts and proteins. These so-called `omics' data sets, combined with genetics established at birth, hold the promise to reveal the molecular causes of the differences between humans, for traits such as height, weight or risk of developing disease. In parallel, recent advances in artificial intelligence have led to the development of powerful methods for making predictions from large data sets, in several areas from autonomous driving to natural language processing. However, machine learning technologies on individual omics signatures are lagging behind, because several challenges still need to be addressed. First, these "black box" methods produce predictions without providing interpretations, preventing experts from getting the evidence necessary to validate the results. Second, current methods tend to learn the datasets "by heart" instead of extracting general knowledge from them. Indeed, although our datasets are large, they contain far fewer participants than variables measured for each participant, which reduces the ability to generalize. Finally, the sources of biological and technical variation, generally inconsistent from one dataset to another, must be considered to obtain reliable predictions in the real world. My research program offers concrete solutions to make these methodologies applicable to omics data, for specific biological problems. One of them aims to predict, from an individual's genetics, changes in the levels of RNA transcripts and metabolites. Another problem targets the prediction of the risk of developing a complex disease based on an individual's omics data. For these concrete applications, we will use omics data from several biobanks, including local (Montreal Heart Institute Biobank), national (CanPath cohort) and international (UK Biobank) cohorts. We will develop these methodologies while ensuring to obtain interpretable and plausible results, which generalize well, and are independent of the noise sources. Our program is interdisciplinary and offers a rich training opportunity for students. Our results will have the potential to improve the appropriate use of machine learning in molecular biology research, providing researchers and Canadian industry with robust tools for omics data analysis that can be interpreted by humans.

近几十年来，测序技术的进步引发了一场革命，导致基因数据爆炸式增长，推动人类基因组学进入大数据时代。甚至最近，新的生物技术使我们能够获得每个个体的分子水平信息，例如代谢物（抗氧化剂，维生素等小分子）的浓度或RNA转录物和蛋白质的定量。这些所谓的“组学”数据集与出生时确定的遗传学相结合，有望揭示人类之间差异的分子原因，如身高、体重或患病风险。与此同时，人工智能的最新进展导致了从自动驾驶到自然语言处理等多个领域的大型数据集进行预测的强大方法的发展。然而，针对个体组学特征的机器学习技术仍然落后，因为仍需要解决几个挑战。首先，这些“黑箱”方法产生的预测没有提供解释，阻止专家获得必要的证据来验证结果。其次，目前的方法倾向于“用心”学习数据集，而不是从中提取一般知识。事实上，尽管我们的数据集很大，但它们包含的参与者远远少于为每个参与者测量的变量，这降低了概括的能力。最后，生物和技术变异的来源，通常不一致，从一个数据集到另一个，必须考虑到获得可靠的预测在真实的世界。我的研究计划提供了具体的解决方案，使这些方法适用于组学数据，为特定的生物学问题。其中一项旨在从个体的遗传学中预测RNA转录物和代谢物水平的变化。另一个问题是基于个体的组学数据预测发展复杂疾病的风险。对于这些具体的应用，我们将使用来自多个生物库的组学数据，包括当地（蒙特利尔心脏研究所生物库），国家（CanPath队列）和国际（英国生物库）队列。我们将开发这些方法，同时确保获得可解释的和合理的结果，推广良好，是独立的噪声源。我们的课程是跨学科的，为学生提供了丰富的培训机会。我们的研究结果将有可能改善机器学习在分子生物学研究中的适当使用，为研究人员和加拿大工业提供可由人类解释的组学数据分析的强大工具。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Hussin, Julie其他文献

A Family-Based Probabilistic Method for Capturing De Novo Mutations from High-Throughput Short-Read Sequencing Data

DOI：
10.2202/1544-6115.1713
发表时间：
2012-01-01
期刊：
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
影响因子：
0.9
作者：
Cartwright, Reed A.;Hussin, Julie;Awadalla, Philip
通讯作者：
Awadalla, Philip

The race to understand immunopathology in COVID-19: Perspectives on the impact of quantitative approaches to understand within-host interactions.

DOI：
10.1016/j.immuno.2023.100021
发表时间：
2023-03
期刊：
Immunoinformatics (Amsterdam, Netherlands)
影响因子：
0
作者：
Gazeau, Sonia;Deng, Xiaoyan;Ooi, Hsu Kiang;Mostefai, Fatima;Hussin, Julie;Heffernan, Jane;Jenner, Adrianne L;Craig, Morgan
通讯作者：
Craig, Morgan