Statistical learning and causal inference in high-dimensional genomics data across multiple information layers
跨多个信息层的高维基因组数据的统计学习和因果推理
基本信息
- 批准号:RGPIN-2022-03708
- 负责人:
- 金额:$ 1.38万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
An increasingly large amount of genomics data are generated and shared to study human biology and complex disorders. Due to the sheer volume and dimensions of data, statistical machine learning methods have become an inevitable tool for researchers to conduct exploratory data analysis and find evidence to support working hypotheses. The main objective of this proposal is to develop a multi-omics machine learning (ML) approach to the study of finding causal mechanisms of human traits. Unlike existing ML methods in computational biology focusing on a single type of omics data, we will emphasize the importance of diverse contextual information in studying complex phenotypes and the necessity to consider multiple data modalities in method development and analysis. To advance ML methods, specializing in biomedical data analysis, we seek to achieve three long-term objectives. (1) We will present a new approach for data integration and exploratory modelling, penetrating deeply through multiple layers of biological information flows. We will implement scalable Bayesian inference methods for multi-modal single-cell data integration and interpretable stochastic block models for cell-cell, cell-gene, and gene-gene interactions. (2) Incorporating the knowledge of the actual generative process, our ML methods will ascertain causal mechanisms across different data modalities and eventually invite collaborators to dissect the mechanisms at a molecular and cellular resolution. A contrastive learning approach will systematically combine multiple lines of scientific/statistical evidence to elucidate causal mechanisms in "causal triangulation," whereby we increase confidence for a certain hypothesis of interest in light of different types of contrasts. (3) Since Bayesian inference is a crucial computational step to many scientific discoveries, including ours, we will strive to make inference methods widely applicable to multiple scientific domains. Notably, we will implement a black-box learning algorithm that takes both individual-level and summary statistics data. Using our ML approach, in collaboration with biomedical research groups, we will ask fundamental questions in human biology: What are natural distributions of human traits? Can we characterize principal axes of phenotypic variation? How are different traits linked with one another? What are the key contributors that make the transitions from healthy to pathological states? While achieving the long-term goals in statistics, my group will seek to analyze a massive amount of real-world data to give quantitative answers to numerous scientific questions. We will organize a total of seven HQPs into three working groups based on scientific interests: cancer biology (2 HQPs), single-cell methodology (3 HQPs), immune disorder groups (2 HQPs). We collaborate with world-class experimental laboratories in University of British Columbia, University of Victoria, Yale, and MIT.
越来越多的基因组学数据被生成和共享,以研究人类生物学和复杂疾病。由于数据的庞大数量和维度,统计机器学习方法已成为研究人员进行探索性数据分析和寻找证据支持工作假设的必然工具。该提案的主要目标是开发一种多组学机器学习(ML)方法来研究人类特征的因果机制。与计算生物学中现有的ML方法专注于单一类型的组学数据不同,我们将强调不同背景信息在研究复杂表型中的重要性,以及在方法开发和分析中考虑多种数据模式的必要性。为了推进ML方法,专注于生物医学数据分析,我们寻求实现三个长期目标。(1)我们将提出一种新的数据集成和探索性建模方法,深入渗透到生物信息流的多个层面。我们将实现可扩展的贝叶斯推理方法,用于多模态单细胞数据集成和细胞-细胞,细胞-基因和基因-基因相互作用的可解释随机块模型。(2)通过对实际生成过程的了解,我们的ML方法将确定不同数据模式的因果机制,并最终邀请合作者以分子和细胞分辨率剖析这些机制。对比学习方法将系统地联合收割机多线的科学/统计证据来阐明因果机制的“因果三角测量”,从而我们增加了信心,根据不同类型的对比感兴趣的某个假设。(3)由于贝叶斯推理是许多科学发现的关键计算步骤,包括我们的发现,我们将努力使推理方法广泛适用于多个科学领域。值得注意的是,我们将实现一个黑盒学习算法,该算法同时采用个人级别和汇总统计数据。使用我们的ML方法,与生物医学研究小组合作,我们将提出人类生物学中的基本问题:人类特征的自然分布是什么?我们能描述表型变异的主轴吗?不同的特征是如何相互联系的?从健康状态转变为病理状态的关键因素是什么?在实现统计学的长期目标的同时,我的团队将寻求分析大量的现实数据,为众多科学问题提供定量答案。我们将根据科学兴趣将总共七个HQP分为三个工作组:癌症生物学(2个HQP),单细胞方法学(3个HQP),免疫疾病组(2个HQP)。我们与不列颠哥伦比亚省大学、维多利亚大学、耶鲁大学和麻省理工学院的世界级实验室合作。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Park, Yongjin其他文献
Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease.
- DOI:
10.1038/s41588-023-01509-5 - 发表时间:
2023-10 - 期刊:
- 影响因子:30.8
- 作者:
Hou, Lei;Xiong, Xushen;Park, Yongjin;Boix, Carles;James, Benjamin;Sun, Na;He, Liang;Patel, Aman;Zhang, Zhizhuo;Molinie, Benoit;Van Wittenberghe, Nicholas;Steelman, Scott;Nusbaum, Chad;Aguet, Francois;Ardlie, Kristin G.;Kellis, Manolis - 通讯作者:
Kellis, Manolis
More than Altruism: Cultural Norms and Remittances Among Hispanics in the USA
- DOI:
10.1007/s12134-015-0423-3 - 发表时间:
2016-05-01 - 期刊:
- 影响因子:1.3
- 作者:
Lopez-Anuarbe, Monika;Cruz-Saco, Maria Amparo;Park, Yongjin - 通讯作者:
Park, Yongjin
Blue Transparent OLEDs with High Stability and Transmittance for Modulating Sleep Disorders
- DOI:
10.1002/admi.202202443 - 发表时间:
2023-03-09 - 期刊:
- 影响因子:5.4
- 作者:
Chae, Hyeonwook;Park, Yongjin;Choi, Kyung Cheol - 通讯作者:
Choi, Kyung Cheol
Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer's disease.
- DOI:
10.1038/s41593-023-01334-3 - 发表时间:
2023-06 - 期刊:
- 影响因子:25
- 作者:
Sun, Na;Akay, Leyla Anne;Murdock, Mitchell H.;Park, Yongjin;Galiana-Melendez, Fabiola;Bubnys, Adele;Galani, Kyriaki;Mathys, Hansruedi;Jiang, Xueqiao;Ng, Ayesha P.;Bennett, David A.;Tsai, Li-Huei;Kellis, Manolis - 通讯作者:
Kellis, Manolis
Adsorption of Pt on defective carbon nanotube walls: a DFT approach
- DOI:
10.1016/j.cpc.2007.02.104 - 发表时间:
2007-07-01 - 期刊:
- 影响因子:6.3
- 作者:
Park, Yongjin;Lahaye, Rob J. W. E.;Lee, Young Hee - 通讯作者:
Lee, Young Hee
Park, Yongjin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Park, Yongjin', 18)}}的其他基金
Statistical learning and causal inference in high-dimensional genomics data across multiple information layers
跨多个信息层的高维基因组数据的统计学习和因果推理
- 批准号:
DGECR-2022-00445 - 财政年份:2022
- 资助金额:
$ 1.38万 - 项目类别:
Discovery Launch Supplement
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
儿童音乐能力发展对语言与社会认知能力及脑发育的影响
- 批准号:31971003
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
多场景网络学习中基于行为-情感-主题联合建模的学习者兴趣挖掘关键技术研究
- 批准号:61702207
- 批准年份:2017
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
- 批准号:61672236
- 批准年份:2016
- 资助金额:64.0 万元
- 项目类别:面上项目
相似海外基金
Next-Generation Algorithms in Statistical Genetics Based on Modern Machine Learning
基于现代机器学习的下一代统计遗传学算法
- 批准号:
10714930 - 财政年份:2023
- 资助金额:
$ 1.38万 - 项目类别:
Bayesian Statistical Learning for Robust and Generalizable Causal Inferences in Alzheimer Disease and Related Disorders Research
贝叶斯统计学习在阿尔茨海默病和相关疾病研究中进行稳健且可推广的因果推论
- 批准号:
10590913 - 财政年份:2023
- 资助金额:
$ 1.38万 - 项目类别:
Statistical methods to characterize causal mechanisms by which air pollution affects the recurrence of cardiovascular events
描述空气污染影响心血管事件复发因果机制的统计方法
- 批准号:
10660281 - 财政年份:2023
- 资助金额:
$ 1.38万 - 项目类别:
Statistical Methods for Inferring Gene-Phenotype Associations Using Omic Data from Gene Knockout and Human Phenotype Studies
使用基因敲除和人类表型研究的组学数据推断基因表型关联的统计方法
- 批准号:
10733165 - 财政年份:2023
- 资助金额:
$ 1.38万 - 项目类别:
Statistical learning and causal inference in high-dimensional genomics data across multiple information layers
跨多个信息层的高维基因组数据的统计学习和因果推理
- 批准号:
DGECR-2022-00445 - 财政年份:2022
- 资助金额:
$ 1.38万 - 项目类别:
Discovery Launch Supplement
Development of robust statistical and machine learning algorithms for extrapolation in causal inference
开发用于因果推理外推的稳健统计和机器学习算法
- 批准号:
2740759 - 财政年份:2022
- 资助金额:
$ 1.38万 - 项目类别:
Studentship
Statistical Inference Based on Real-World-Data
基于真实世界数据的统计推断
- 批准号:
19H04072 - 财政年份:2019
- 资助金额:
$ 1.38万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Statistical Modeling of Multiparental and Genetic Reference Populations
多亲和遗传参考群体的统计模型
- 批准号:
10373986 - 财政年份:2018
- 资助金额:
$ 1.38万 - 项目类别:
Statistical Modeling of Multiparental and Genetic Reference Populations
多亲和遗传参考群体的统计模型
- 批准号:
9893003 - 财政年份:2018
- 资助金额:
$ 1.38万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
10676866 - 财政年份:2015
- 资助金额:
$ 1.38万 - 项目类别: