Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
基本信息
- 批准号:10020414
- 负责人:
- 金额:$ 38.88万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressBasic ScienceBig DataBiologicalBiological ProcessBiologyBiomedical ResearchComplexComputing MethodologiesDataDependenceDevelopmentDiseaseEffectivenessFutureGene ExpressionGenesHealthcareInterventionKnowledgeLeadLearningLinear ModelsMachine LearningMeasurementMethodologyModelingModernizationMolecularMolecular BiologyMolecular GeneticsOutcomePatient-Focused OutcomesPhenotypeResearchSamplingSelection CriteriaSignal TransductionStatistical ModelsTechniquesTechnologyTrainingTranslatingbiomarker discoveryclinical practiceclinically translatablecomputer frameworkdeep learningexperimental studyfeature selectionhigh dimensionalityinnovationinquiry-based learningmolecular markernovelprecision medicinepredictive modelingsuccesstherapeutic target
项目摘要
Project Summary
Biomedical data is vastly increasing in quantity, scope, and generality, expanding opportunities to discover
novel biological processes and clinically translatable outcomes. Machine learning (ML), a key technology in
modern biology that addresses these changing dynamics, aims to infer meaningful interactions among variables
by learning their statistical relationships from data consisting of measurements on variables across samples.
Accurate inference of such interactions from big biological data can lead to novel biological discoveries,
therapeutic targets, and predictive models for patient outcomes. However, a greatly increased hypothesis space,
complex dependencies among variables, and complex “black-box” ML models pose complex, open challenges.
To meet these challenges, we have been developing innovative, rigorous, and principled ML techniques to infer
reliable, accurate, and interpretable statistical relationships in various kinds of biological network inference problems,
pushing the boundaries of both ML and biology.
Fundamental limitations of current ML techniques leave many future opportunities to translate inferred
statistical relationships into biological knowledge, as exemplified in a standard biomarker discovery problem –
an extremely important problem for precision medicine. Biomarker discovery using high-throughput molecular
data (e.g., gene expression data) has significantly advanced our knowledge of molecular biology and genetics.
The current approach attempts to find a set of features (e.g., gene expression levels) that best predict a phenotype
and use the selected features, or molecular markers, to determine the molecular basis for the phenotype.
However, the low success rates of replication in independent data and of reaching clinical practice indicate three
challenges posed by current ML approach. First, high-dimensionality, hidden variables, and feature correlations
create a discrepancy between predictability (i.e., statistical associations) and true biological interactions; we need
new feature selection criteria to make the model better explain rather than simply predict phenotypes. Second,
complex models (e.g., deep learning or ensemble models) can more accurately describe intricate relationships
between genes and phenotypes than simpler, linear models, but they lack interpretability. Third, analyzing
observational data without conducting interventional experiments does not prove causal relations.
To address these problems, we propose an integrated machine learning methodology for learning interpretable models
from data that will: 1) select interpretable features likely to provide meaningful phenotype explanations, 2) make
interpretable predictions by estimating the importance of each feature to a prediction, and 3) iteratively validate
and refine predictions through interventional experiments. For each challenge, we will develop a generalizable
ML framework that focuses on different aspects of model interpretability and will therefore be applicable to any
formerly intractable, high-impact healthcare problems. We will also demonstrate the effectiveness of each ML
framework for a wide range of topics, from basic science to disease biology to bedside applications.
项目摘要
生物医学数据在数量、范围和通用性方面都在大幅增加,
新的生物学过程和临床可转化的结果。机器学习(ML),
现代生物学致力于解决这些变化的动态,旨在推断变量之间有意义的相互作用
通过从由对样本中变量的测量组成的数据中学习它们的统计关系。
从大的生物学数据中准确推断这种相互作用可以导致新的生物学发现,
治疗目标和患者结果的预测模型。然而,假设空间大大增加,
变量之间的复杂依赖关系和复杂的“黑盒”ML模型构成了复杂的开放性挑战。
为了应对这些挑战,我们一直在开发创新的,严格的,有原则的ML技术来推断
在各种生物网络推理问题中的可靠、准确和可解释的统计关系,
推动机器学习和生物学的边界。
当前机器学习技术的基本局限性为未来的翻译推断留下了许多机会。
将统计关系转化为生物学知识,如标准生物标志物发现问题中所例示的,
这是精准医疗的一个极其重要的问题。利用高通量分子生物学技术发现生物标志物
数据(例如,基因表达数据)显著地推进了我们对分子生物学和遗传学的认识。
当前的方法试图找到一组特征(例如,基因表达水平)最能预测表型
并使用所选择的特征或分子标记来确定表型的分子基础。
然而,在独立数据和达到临床实践的复制成功率低表明,
当前ML方法带来的挑战。第一,高维、隐变量和特征相关性
在可预测性(即,统计关联)和真实的生物学相互作用;我们需要
新的特征选择标准,使模型更好地解释而不是简单地预测表型。第二、
复杂模型(例如,深度学习或集成模型)可以更准确地描述复杂的关系
基因和表型之间的关系比简单的线性模型更好,但它们缺乏可解释性。第三,分析
没有进行干预性实验的观察数据不能证明因果关系。
为了解决这些问题,我们提出了一个集成的机器学习方法来学习可解释的模型
这些数据将:1)选择可能提供有意义的表型解释的可解释特征,2)
通过估计每个特征对预测的重要性来解释预测,以及3)迭代地验证
并通过干预性实验来完善预测。对于每个挑战,我们将制定一个可概括的
ML框架专注于模型可解释性的不同方面,因此适用于任何
以前难以解决的高影响力的医疗保健问题。我们还将展示每个ML的有效性
涵盖广泛主题的框架,从基础科学到疾病生物学再到床边应用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Su-In Lee其他文献
Su-In Lee的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Su-In Lee', 18)}}的其他基金
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10132962 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10613437 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets
可解释的机器学习识别阿尔茨海默病的治疗目标
- 批准号:
10347341 - 财政年份:2019
- 资助金额:
$ 38.88万 - 项目类别:
Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
- 批准号:
10437684 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Opening the Black Box of Machine Learning Models
打开机器学习模型的黑匣子
- 批准号:
10224845 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Application of Data Sciences in Traumatic Brain Injury
数据科学在脑外伤中的应用
- 批准号:
9685513 - 财政年份:2018
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10260483 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10438909 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10670111 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
Core F: Artificial Intelligence and Bioinformatics
核心F:人工智能和生物信息学
- 批准号:
10042623 - 财政年份:1997
- 资助金额:
$ 38.88万 - 项目类别:
相似海外基金
HNDS-R: Connectivity, Inclusiveness, and the Permeability of Basic Science
HNDS-R:基础科学的连通性、包容性和渗透性
- 批准号:
2318404 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Standard Grant
Advancing the basic science of membrane permeability in macrocyclic peptides
推进大环肽膜渗透性的基础科学
- 批准号:
10552484 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Computer Vision for Malaria Microscopy: Automated Detection and Classification of Plasmodium for Basic Science and Pre-Clinical Applications
用于疟疾显微镜的计算机视觉:用于基础科学和临床前应用的疟原虫自动检测和分类
- 批准号:
10576701 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Bringing together communities and basic science researchers to build stronger relationships
将社区和基础科学研究人员聚集在一起,建立更牢固的关系
- 批准号:
480914 - 财政年份:2023
- 资助金额:
$ 38.88万 - 项目类别:
Miscellaneous Programs
“L-form” bacteria: basic science, antibiotics, evolution and biotechnology
L 型细菌:基础科学、抗生素、进化和生物技术
- 批准号:
FL210100071 - 财政年份:2022
- 资助金额:
$ 38.88万 - 项目类别:
Australian Laureate Fellowships
Coordinating and Data Management Center for Translational and Basic Science Research in Early Lesions
早期病变转化和基础科学研究协调和数据管理中心
- 批准号:
10517004 - 财政年份:2022
- 资助金额:
$ 38.88万 - 项目类别:
Developing science communication on large scale basic science represented by accelerator science
发展以加速器科学为代表的大规模基础科学科学传播
- 批准号:
22K02974 - 财政年份:2022
- 资助金额:
$ 38.88万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Basic Science Core - Biosafety & Biocontainment Core (BBC)
基础科学核心 - 生物安全
- 批准号:
10431468 - 财政年份:2022
- 资助金额:
$ 38.88万 - 项目类别: