权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets

可解释的机器学习识别阿尔茨海默病的治疗目标

基本信息

批准号：
10613437
负责人：
Su-In Lee
金额：
$ 58.2万
依托单位：
UNIVERSITY OF WASHINGTON
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-02-15 至 2024-12-21
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10613437
关键词：
Acceleration Address Affect Algorithms Alzheimer&apos s Disease Alzheimer&apos s disease diagnosis Alzheimer&apos s disease model Alzheimer&apos s disease therapeutic Amyloid beta-Protein Animal Model Award Big Data Biological Biological Markers Brain Caenorhabditis elegans Cause of Death Cell physiology Classification Collaborations Complex Computer Models Country Data Data Science Data Set Disease Disease Progression Drug Targeting Educational workshop Elasticity Frequencies Gene Expression Genes Genetic study Heterogeneity Human Image Individual International Intervention Knowledge Label Lasso Learning Linear Models Machine Learning Measures Methods Modeling Molecular Molecular Chaperones Multiomic Data Nature Nematoda Network-based Neurofibrillary Tangles Oral Orthologous Gene Outcome Paper Pathogenesis Pathologic Pathology Pathway interactions Peptides Phenotype Play Prevention RNA Interference Random Allocation Research Priority Role Selection Criteria Senile Plaques Signal Transduction Statistical Models Techniques Testing Toxic effect Training Transgenic Organisms Trees United States Validation autoencoder biomarker discovery brain tissue candidate identification candidate marker clinical practice deep learning deep learning model direct application disease heterogeneity drug response prediction effective therapy experimental study feature selection gene function gene interaction gene network high dimensionality human data improved in vivo interest knock-down machine learning algorithm machine learning framework machine learning method molecular marker neuropathology novel outcome prediction phenotypic biomarker precision medicine predictive modeling protective factors proteostasis rapid growth response success tau Proteins therapeutic target therapy development

项目摘要

Project Summary Alzheimer’s disease (AD) is an urgent national and international research priority. Amyloid plaques and neurofibrillary tangles are the hallmark of AD. Their building blocks are Amyloid-β (Aβ) and tau, respectively. At present, we lack an understanding of the set of genes that affect formation of plaques and tangles along with protective and pathological responses to these toxic peptides. Biologists are now gathering gene expression data and Aβ and tau measures from human brain tissues. The current approach attempts to find a set of features (here, gene expression levels) that best predict an outcome (Aβ or tau level). The identified features, biomarkers, can help determine the molecular basis for plaques and tangles. Unfortunately, false positive biomarkers are very common, as evidenced by low success rates of replication in independent data and low success reaching clinical practice (less than 1%). We seek to radically shift the current paradigm in biomarker discovery by resolving three fundamental problems with the current approach using novel, theoretically well-founded machine learning (ML) methods to learn interpretable models from data. Aim 1. Learn an interpretable feature representation from publicly available, high-throughput brain data. High-dimensionality, hidden variables, and complex feature correlations create a discrepancy between predictability (i.e., observed statistical associations) and true biological interactions. To increase the chance to identify true positive biomarkers, we need new feature selection criteria to learn a model that better explains rather than simply predicts the outcome. To do so, our proposed ML algorithms will identify the genes that are likely to give a meaningful explanation of the outcome (Aβ or tau level) by inferring both the functions of genes in the cellular processes contributing to AD and the gene interaction network from many existing brain datasets. Aim 2. Make interpretable predictions using a unified framework to explain model predictions. Due to disease heterogeneity, complex models (e.g., deep learning or ensemble models) often more accurately describe relationships between genes and an outcome than simpler, linear models, but lack interpretability. We will develop a novel ML framework that interprets complex model predictions by estimating the importance of each feature to a specific prediction, which will identify features of high importance for each individual as personalized markers and classify subjects based on these importance estimates. Aim 3. Validate the identified candidate biomarkers using powerful worm models of AD. Analyzing observational data without doing interventional experiments cannot prove causal relationships. In collaboration with co-I Matt Kaeberlein, we will utilize powerful nematode models of AD to test our hypotheses on the role of certain genes as disease modifiers, and develop a new way to refine the models based on this knowledge. Successful completion of this project will result in previously unknown molecular basis for Aβ and tau levels, potential therapeutic targets, and general ML techniques widely applicable to many other data science problems.

项目摘要阿尔茨海默病（AD）是一个紧迫的国家和国际研究重点。淀粉样蛋白斑和神经元缠结是AD的标志。它们的结构单元分别是淀粉样蛋白-β（Aβ）和tau。目前，我们对影响斑块和缠结沿着的基因组缺乏了解，对这些有毒肽的保护和病理反应。生物学家现在正在收集人类大脑组织的基因表达数据以及Aβ和tau测量。的目前的方法试图找到一组最能预测结果（Aβ）的特征（这里是基因表达水平或τ水平）。所识别的特征，生物标志物，可以帮助确定斑块和缠结的分子基础。不幸的是，假阳性生物标志物是非常常见的，如通过低复制成功率所证明的。独立的数据和低成功达到临床实践（不到1%）。我们试图从根本上改变通过使用新的，理论上有充分依据的机器学习（ML）方法，从数据中学习可解释的模型。目标1.从公开的高通量大脑数据中学习可解释的特征表示。高维度、隐藏变量和复杂的特征相关性会在可预测性（即，观察到的统计学关联）和真实的生物学相互作用。为了增加识别真正的阳性生物标志物，我们需要新的特征选择标准来学习一个模型，而不是简单地预测结果。为此，我们提出的ML算法将识别出通过推断基因的功能，可能对结果（Aβ或tau水平）给出有意义的解释。从许多现有的大脑数据集中发现了导致AD的细胞过程和基因相互作用网络。目标二。使用统一的框架来解释模型预测，做出可解释的预测。由于疾病异质性，复杂模型（例如，深度学习或集成模型）通常更准确地描述基因和结果之间的关系比更简单的线性模型更重要，但缺乏可解释性。我们将开发一个新的机器学习框架，通过估计每个模型的重要性来解释复杂的模型预测。特征到特定预测，该预测将识别对每个人来说非常重要的特征，标记并基于这些重要性估计对主题进行分类。目标3。使用AD的强大蠕虫模型来验证所识别的候选生物标志物。分析没有进行干预性实验的观察数据不能证明因果关系。合作我们将与我的同事Matt Kaeberlein一起，利用强大的AD线虫模型来测试我们关于AD在人类中的作用的假设。某些基因作为疾病修饰因子，并开发一种新的方法来完善基于这些知识的模型。该项目的成功完成将导致以前未知的Aβ和tau水平的分子基础，潜在的治疗目标，以及广泛适用于许多其他数据科学问题的通用ML技术。