Causal and integrative deep learning for Alzheimer's disease genetics
阿尔茨海默病遗传学的因果和综合深度学习
基本信息
- 批准号:10267373
- 负责人:
- 金额:$ 73.34万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-15 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:AlgorithmsAlzheimer&aposs DiseaseAlzheimer&aposs disease riskBiologicalBrainBrain regionCommunitiesComplexComputer softwareDNA MethylationDataData AnalysesData SetDiseaseDocumentationEarly DiagnosisEpigenetic ProcessEtiologyGene ExpressionGene ProteinsGeneticGenetic DiseasesGenomeGenomicsGoalsImageInfluentialsInterventionKnowledgeLeast-Squares AnalysisLinear ModelsLinear RegressionsMethodsModelingMolecularMolecular TargetMotivationNeural Network SimulationNon-linear ModelsOperant ConditioningOutcomePreventionProtein RegionProteomicsPublic DomainsPublishingPythonsResearchRiskRisk FactorsSamplingSystems AnalysisTechnologyTensorFlowTherapeutic InterventionTimeTweensbasecausal variantcognitive systemcomputerized toolsdeep learningdeep neural networkdrug developmentendophenotypeepigenomicsflexibilityfunctional genomicsgenetic associationgenome sequencinggenome wide association studygenome-widegenomic dataimprovedinsightinterestlearning strategymachine learning methodmodifiable riskmolecular imagingneural networkneuroimagingnovelphenotypic datapleiotropismpredictive modelingprogramsprotective factorsresponsesoftware developmentstatistical and machine learningtherapeutic developmenttherapy developmenttraittranscriptometranscriptomicswhole genome
项目摘要
Summary
In response to PAR-19-269, “Cognitive Systems Analysis of Alzheimer's Disease Genetic and Phenotypic
Data”, we propose developing and applying more powerful and robust machine learning methods for causal and
integrative analysis, especially deep learning approaches for instrumental variable analysis, to identify causal
risk/protective factors for Alzheimer's disease (AD) in the post-GWAS era by leveraging published large-scale
GWAS, whole-genome sequencing (WGS) and other omic and neuroimaging data. Our main motivation is to ex-
tend an emerging and increasingly influential approach of integrating GWAS with gene expression data, called
transcriptome-wide association studies (TWAS), aiming to improve over the current practice of GWAS by not only
increasing statistical power, but also identifying (putative) causal genes, thus gaining insights into the genetic basis
of common diseases and complex traits. The statistical principle underlying TWAS is the (two-sample) two-stage
least squares (2SLS) for linear models in the framework of instrumental variable (IV) analysis for causal inference.
In practice, however, TWAS may fail to identify true causal genes while giving false positives due to the violation
of its modeling assumptions, e.g., due to non-linear effects of IVs or gene expression, or due to invalid IVs (in the
presence of horizontal pleiotropy of SNPs). First, we propose developing linear models and neural network models
incorporating a large number of functional annotations on the genome (e.g. various types of functional genomic
and epigenetic data from the ENCODE and Roadmap Epigenomics projects) as prior knowledge to improve im-
puting/predicting gene expression (or other molecular or imaging endophenotypes or complex traits/diseases) via
SNPs, corresponding to the first stage of 2SLS. Second, we propose neural networks as more flexible non-linear
models for the second stage of 2SLS in the presence of invalid IVs, which may be the SNPs having direct (or
horizontal pleiotropic) effects on the outcome as expected from the wide-spread pleiotropy. Then we combine the
approaches in the above two stages to form a more flexible and robust neural network approach as an extension of
2SLS for causal inference. Third, we consider inferring causal directions between two traits, e.g. a gene's expres-
sion and AD, allowing non-linear relationships between SNPs and traits and between the two traits. This is critical
in reducing false positives, e.g. due to reverse causation, but has been largely under-studied. Fourth, we apply the
new (and existing) methods to transcriptomic, proteomic, neuroimaging and AD GWAS/WGS data to identify (pu-
tative) causal genes, proteins and brain regions of interest (ROIs) for AD, while building the corresponding genetic
prediction models for endophenotypes and AD risk. Finally, we will develop and disseminate publicly available
software implementing the proposed analysis methods, e.g. as Python programs or R packages, to facilitate the
wide use by the scientific community.
总结
回应PAR-19-269,“阿尔茨海默病遗传和表型的认知系统分析
数据”,我们建议开发和应用更强大和强大的机器学习方法,
综合分析,特别是工具变量分析的深度学习方法,以确定因果关系
在后GWAS时代,通过利用已发表的大规模
GWAS、全基因组测序(WGS)和其他组学和神经成像数据。我们的主要动机是,
倾向于一种新兴的、越来越重要的将GWAS与基因表达数据整合的方法,称为
全转录组关联研究(TWAS),旨在通过不仅
增加统计能力,但也确定(假定)因果基因,从而获得对遗传基础的见解
常见疾病和复杂特征的集合TWAS背后的统计原理是(双样本)两阶段
最小二乘(2SLS)的线性模型的框架内的工具变量(IV)的因果推理分析。
然而,在实践中,TWAS可能无法识别真正的因果基因,同时由于违规而给出假阳性
它的建模假设,例如,由于IV或基因表达的非线性效应,或由于无效IV(在
SNP的水平多效性的存在)。首先,我们提出发展线性模型和神经网络模型
在基因组上并入大量功能注释(例如,各种类型的功能基因组注释),
和表观遗传学数据从ENCODE和路线图表观基因组学项目)作为先验知识,以改善免疫系统,
通过以下方式预测基因表达(或其他分子或成像内表型或复杂性状/疾病):
SNP,对应于2SLS的第一阶段。其次,我们提出神经网络作为更灵活的非线性
在存在无效IV的情况下,2SLS的第二阶段的模型,其可以是具有直接(或
水平多效性)对结果的影响,如从广泛分布的多效性所预期的。然后我们将联合收割机
方法在上述两个阶段,以形成一个更灵活和强大的神经网络的方法,作为一个扩展,
2SLS用于因果推理。第三,我们考虑推断两个性状之间的因果方向,例如,一个基因的表达,
锡永和AD,允许SNP和性状之间以及两个性状之间的非线性关系。这一点至关重要
减少假阳性,例如由于反向因果关系,但在很大程度上研究不足。第四,我们应用
新的(和现有的)方法,转录组学,蛋白质组学,神经成像和AD GWAS/WGS数据,以确定(PU-
目的)致病基因、蛋白质和AD的脑感兴趣区域(ROI),同时构建相应的遗传标记。
内表型和AD风险的预测模型。最后,我们将开发和传播公开可用的
实现所提出的分析方法的软件,例如Python程序或R软件包,以促进
被科学界广泛使用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Wei Pan其他文献
Wei Pan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Wei Pan', 18)}}的其他基金
Estimation and inference in directed acyclic graphical models for biological networks
生物网络有向无环图模型的估计和推理
- 批准号:
10330130 - 财政年份:2022
- 资助金额:
$ 73.34万 - 项目类别:
Estimation and inference in directed acyclic graphical models for biological networks
生物网络有向无环图模型的估计和推理
- 批准号:
10595510 - 财政年份:2022
- 资助金额:
$ 73.34万 - 项目类别:
Causal and integrative deep learning for Alzheimer's disease genetics
阿尔茨海默病遗传学的因果和综合深度学习
- 批准号:
10483117 - 财政年份:2021
- 资助金额:
$ 73.34万 - 项目类别:
Discovering causal genes, brain regions and other risk factors for Alzheimer'a disease
发现阿尔茨海默病的致病基因、大脑区域和其他危险因素
- 批准号:
10358645 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Integrating Alzheimer's disease GWAS with proteomic and metabolomic QTL data
将阿尔茨海默病 GWAS 与蛋白质组学和代谢组学 QTL 数据整合
- 批准号:
10018279 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Deep Learning with Neuroimaging Genetic Data for Alzheimer's Disease
利用神经影像遗传数据进行深度学习治疗阿尔茨海默病
- 批准号:
10647797 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Discovering causal genes, brain regions and other risk factors for Alzheimer'a disease
发现阿尔茨海默病的致病基因、大脑区域和其他危险因素
- 批准号:
10561609 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Deep Learning with Neuroimaging Genetic Data for Alzheimer's Disease
利用神经影像遗传数据进行深度学习治疗阿尔茨海默病
- 批准号:
10267714 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Discovering causal genes, brain regions and other risk factors for Alzheimer'a disease
发现阿尔茨海默病的致病基因、大脑区域和其他危险因素
- 批准号:
10116249 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别:
Deep Learning with Neuroimaging Genetic Data for Alzheimer's Disease
利用神经影像遗传数据进行深度学习治疗阿尔茨海默病
- 批准号:
10088703 - 财政年份:2020
- 资助金额:
$ 73.34万 - 项目类别: