Collaborative Research: ABI Innovation: Interpretable Machine Learning to Identify Molecular Markers for Complex Phenotypes

合作研究:ABI 创新:可解释的机器学习来识别复杂表型的分子标记

基本信息

  • 批准号:
    1759487
  • 负责人:
  • 金额:
    $ 149.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-06-01 至 2024-05-31
  • 项目状态:
    已结题

项目摘要

Biologists are now able to gather complete sets of gene expression data and protein concentrations for particular targets from specific tissues. The presence and concentrations of these molecules serve as features when determining a diagnostic pattern for specific states of development or disease. The approach to biomarker identification taken in this research attempts to find a set of features (here, gene expression levels) that best predict an outcome (protein levels occurring in the condition). The identified features, biomarkers, can help determine the molecular basis for the condition. Unfortunately, false positive biomarkers are very common, as evidenced by low success rates of replication in independent data sets and therefore low success in such markers becoming important in applications such as diagnostics in clinical practice. We seek to radically shift the current paradigm in biomarker discovery by resolving fundamental problems with the current approach by using novel, theoretically well-founded machine learning (ML) methods to learn interpretable models from data, and follow this up with a systematic experimental validation system in model organisms. The disease model we are using is for Alzheimer's disease (AD), an urgent national and international research priority. Amyloid plaques and neurofibrillary tangles are the hallmark of AD, and their building blocks are Amyloid-alpha and tau proteins, respectively. These proteins can be measured accurately from human brain tissues, as can global gene expression values. At present, we lack an understanding of the set of genes that affect formation of plaques and tangles, or any protective or pathological responses to these toxic peptides. Biomarker discovery using high-throughput molecular data (e.g., gene expression data) has significantly advanced our knowledge of molecular biology and genetics. The current approach attempts to find a set of features (e.g., gene expression levels) that best predict a phenotype and use the selected features, molecular markers, to determine the molecular basis for the phenotype. However, the low success rates of replication in independent data indicate three fundamental problems with this approach. First, high-dimensionality, hidden variables, and feature correlations create a discrepancy between predictability (i.e., statistical associations) and true biological interactions; we need new feature selection criteria to make the model better explain rather than simply predict phenotypes. Second, complex models (e.g., deep learning or ensemble models) can more accurately describe intricate relationships between genes and phenotypes than simpler, linear models, but they lack interpretability. Third, analyzing observational data without conducting interventional experiments does not prove causal relations. To address these problems, we propose an integrated machine learning methodology for learning interpretable models from data by 1) selecting interpretable features, 2) making interpretable predictions, and 3) validating and refining predictions through interventional experiments. This approach has the following aims:1. Develop NEBULA (network-based unsupervised feature learning) framework to learn interpretable features that will likely provide meaningful phenotype explanations from publicly available multi-omic data sets. 2. Develop a unified framework, called SHAP (Shapley additive explanation), to interpret the predictions of complex models by estimating the importance of each feature to a particular prediction.3. Validate and refine predictions through interventional experiments using high-throughput assays of gene knockdown on powerful nematode models of proteotoxicity. For further information see the project website at: http://suinlee.cs.washington.edu/projects/im3.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
生物学家现在能够从特定组织中收集特定靶点的完整基因表达数据和蛋白质浓度。这些分子的存在和浓度在确定特定发育状态或疾病的诊断模式时用作特征。本研究中采用的生物标志物鉴定方法试图找到一组最能预测结果(条件中发生的蛋白质水平)的特征(此处为基因表达水平)。所确定的特征,生物标志物,可以帮助确定条件的分子基础。不幸的是,假阳性生物标志物是非常常见的,如独立数据集中复制的低成功率所证明的,并且因此这种标志物的低成功率在诸如临床实践中的诊断的应用中变得重要。我们寻求从根本上改变生物标志物发现的当前范式,通过使用新颖的、理论上有充分依据的机器学习(ML)方法从数据中学习可解释的模型来解决当前方法的根本问题,并通过系统的实验验证系统来跟进这一点。模式生物。 我们正在使用的疾病模型是阿尔茨海默病(AD),这是一个紧迫的国家和国际研究重点。淀粉样蛋白斑块和神经元缠结是AD的标志,它们的结构单元分别是淀粉样蛋白-α和tau蛋白。这些蛋白质可以从人脑组织中准确测量,全球基因表达值也可以。目前,我们缺乏对影响斑块和缠结形成的基因组的理解,或者对这些有毒肽的任何保护性或病理性反应。 使用高通量分子数据的生物标志物发现(例如,基因表达数据)显著地推进了我们对分子生物学和遗传学的认识。当前的方法试图找到一组特征(例如,基因表达水平),其最好地预测表型,并使用所选择的特征、分子标记来确定表型的分子基础。然而,在独立数据中复制的低成功率表明这种方法存在三个基本问题。首先,高维、隐变量和特征相关性在可预测性(即,统计关联)和真实的生物相互作用;我们需要新的特征选择标准,使模型更好地解释,而不是简单地预测表型。第二,复杂的模型(例如,深度学习或集成模型)可以比简单的线性模型更准确地描述基因和表型之间的复杂关系,但它们缺乏可解释性。第三,分析观察数据而不进行干预实验并不能证明因果关系。为了解决这些问题,我们提出了一种集成的机器学习方法,通过1)选择可解释的特征,2)进行可解释的预测,3)通过干预实验验证和改进预测,从数据中学习可解释的模型。这种方法有以下目的:1.开发基于网络的无监督特征学习框架,以学习可解释的特征,这些特征可能会从公开的多组学数据集中提供有意义的表型解释。2.开发一个统一的框架,称为SHAP(Shapley加法解释),通过估计每个特征对特定预测的重要性来解释复杂模型的预测。通过干预性实验,使用高通量基因敲除测定强大的线虫模型蛋白毒性,验证和完善预测。 欲了解更多信息,请参阅项目网站:http://suinlee.cs.washington.edu/projects/im3.This奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
From local explanations to global understanding with explainable AI for trees
  • DOI:
    10.1038/s42256-019-0138-9
  • 发表时间:
    2020-01-01
  • 期刊:
  • 影响因子:
    23.8
  • 作者:
    Lundberg, Scott M.;Erion, Gabriel;Lee, Su-In
  • 通讯作者:
    Lee, Su-In
Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer's disease neuropathologies.
  • DOI:
    10.1038/s41467-021-25680-7
  • 发表时间:
    2021-09-10
  • 期刊:
  • 影响因子:
    16.6
  • 作者:
    Beebe-Wang N;Celik S;Weinberger E;Sturmfels P;De Jager PL;Mostafavi S;Lee SI
  • 通讯作者:
    Lee SI
Improving performance of deep learning models with axiomatic attribution priors and expected gradients
  • DOI:
    10.1038/s42256-021-00343-w
  • 发表时间:
    2021-05-31
  • 期刊:
  • 影响因子:
    23.8
  • 作者:
    Erion, Gabriel;Janizek, Joseph D.;Lee, Su-In
  • 通讯作者:
    Lee, Su-In
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Su-In Lee其他文献

Titanizing on the surface of iron metal foam
  • DOI:
    10.1016/j.tca.2014.02.008
  • 发表时间:
    2014-04-10
  • 期刊:
  • 影响因子:
  • 作者:
    Su-In Lee;Jung-Yeul Yun;Tae-Soo Lim;Byoung-Kee Kim;Young-Min Kong;Jei-Pil Wang;Dong-Won Lee
  • 通讯作者:
    Dong-Won Lee
Deep profiling of gene expression across 18 human cancers
对 18 种人类癌症中基因表达的深度剖析
  • DOI:
    10.1038/s41551-024-01290-8
  • 发表时间:
    2024-12-17
  • 期刊:
  • 影响因子:
    26.600
  • 作者:
    Wei Qiu;Ayse B. Dincer;Joseph D. Janizek;Safiye Celik;Mikael J. Pittet;Kamila Naxerova;Su-In Lee
  • 通讯作者:
    Su-In Lee
Algorithms to estimate Shapley value feature attributions
用于估计夏普利值特征归因的算法
  • DOI:
    10.1038/s42256-023-00657-x
  • 发表时间:
    2023-05-22
  • 期刊:
  • 影响因子:
    23.900
  • 作者:
    Hugh Chen;Ian C. Covert;Scott M. Lundberg;Su-In Lee
  • 通讯作者:
    Su-In Lee

Su-In Lee的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Su-In Lee', 18)}}的其他基金

CAREER: Learning the Chromatin Network from ChIP-Seq Data
职业:从 ChIP-Seq 数据学习染色质网络
  • 批准号:
    1552309
  • 财政年份:
    2016
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Continuing Grant
ABI Innovation: A Probabilistic Approach to Meta-Analysis of Biological Network Interface
ABI Innovation:生物网络接口元分析的概率方法
  • 批准号:
    1355899
  • 财政年份:
    2014
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: Sustainable ABI: Arctos Sustainability
合作研究:可持续 ABI:Arctos 可持续性
  • 批准号:
    2034568
  • 财政年份:
    2021
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: FuTRES, an Ontology-Based Functional Trait Resource for Paleo- and Neo-biologists
合作研究:ABI 创新:FuTRES,为古生物学家和新生物学家提供的基于本体的功能性状资源
  • 批准号:
    2201182
  • 财政年份:
    2021
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Development: Symbiota2: Enabling greater collaboration and flexibility for mobilizing biodiversity data
协作研究:ABI 开发:Symbiota2:为调动生物多样性数据提供更大的协作和灵活性
  • 批准号:
    2209978
  • 财政年份:
    2021
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Towards Computational Exploration of Large-Scale Neuro-Morphological Datasets
合作研究:ABI 创新:大规模神经形态数据集的计算探索
  • 批准号:
    2028361
  • 财政年份:
    2020
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Enabling machine-actionable semantics for comparative analyses of trait evolution
合作研究:ABI 创新:启用机器可操作的语义以进行特征进化的比较分析
  • 批准号:
    2048296
  • 财政年份:
    2020
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Development: Integrated platforms for protein structure and function predictions
合作研究:ABI开发:蛋白质结构和功能预测的集成平台
  • 批准号:
    2021734
  • 财政年份:
    2020
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Biofilm Resource and Information Database (BRaID): A Tool to Fuse Diverse Biofilm Data Types
合作研究:ABI 创新:生物膜资源和信息数据库 (BRaID):融合多种生物膜数据类型的工具
  • 批准号:
    2027203
  • 财政年份:
    2019
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Development: Building a Pipeline for Validation, Curation and Archiving of Integrative/Hybrid Models
合作研究:ABI 开发:构建集成/混合模型的验证、管理和归档管道
  • 批准号:
    1756250
  • 财政年份:
    2018
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Continuing Grant
Collaborative Research: ABI Development: The next stage in protein-protein docking
合作研究:ABI 开发:蛋白质-蛋白质对接的下一阶段
  • 批准号:
    1759472
  • 财政年份:
    2018
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Quantifying biogeographic history: a novel model-based approach to integrating data from genes, fossils, specimens, and environments
合作研究:ABI 创新:量化生物地理历史:一种基于模型的新颖方法来整合来自基因、化石、标本和环境的数据
  • 批准号:
    1759729
  • 财政年份:
    2018
  • 资助金额:
    $ 149.93万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了