权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Federated and transfer learning methods for cross-ancestry and cross-phenotype integration of genomic datasets

用于基因组数据集跨血统和跨表型整合的联合和迁移学习方法

基本信息

批准号：
10564023
负责人：
Rui Duan
金额：
$ 43.15万
依托单位：
HARVARD SCHOOL OF PUBLIC HEALTH
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-12-20 至 2027-11-30
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10564023
关键词：
Accounting Address African ancestry Algorithms All of Us Research Program Assessment tool Biological Cardiovascular Diseases Cloud Computing Communication Community Medicine Complex Computer software Data Data Collection Data Commons Disease Disparity Disparity population Electronic Health Record Environment European ancestry Genetic Genetic Risk Genotype Goals Healthcare Heritability Heterogeneity High Performance Computing Inequity Institution Joints Knowledge Learning Link Major Depressive Disorder Mental disorders Meta-Analysis Methodology Methods Minority Groups Modeling Participant Patients Performance Phenotype Population Population Heterogeneity Rare Diseases Reproducibility Risk Assessment Risk Factors Sample Size Sampling Source Techniques Testing Trans-Omics for Precision Medicine Translating Underrepresented Populations United States National Institutes of Health Variant biobank clinical risk cloud platform cluster computing computerized tools cost effective data integration data privacy disorder risk diverse data federated learning genetic architecture genome wide association study genomic data graph knowledge base health disparity high dimensionality improved insight knowledge graph learning algorithm learning strategy multi-ethnic multi-task learning novel open source participant enrollment pleiotropism polygenic risk score precision medicine privacy preservation privacy protection programs risk prediction risk prediction model risk stratification statistics trait transfer learning user friendly software

项目摘要

Abstract This proposal aims to develop advanced data integration methods for improving genetic risk prediction in under-represented non-European populations. Genome-Wide Association Studies (GWAS) have yielded important biological insights into the heritable basis of many complex traits and diseases, and polygenic risk scores (PRS) have shown promising potentials for disease risk stratification. However, since the vast majority of participants in large-scale genomic datasets are from European ancestry (EA) populations, the performance of current PRS is much poorer in non-EA populations than in EA populations, which may exacerbate existing health disparities. Despite some recent inclusive data collection efforts, current risk prediction methods cannot effectively address the heavily unbalanced sample sizes across populations. Robust data integration methods are needed to leverage similarities in genetic architectures across ancestral populations, phenotypic correlations and pleiotropy, and variant functional annotations while accounting for different sources of heterogeneity. Moreover, as various national and institutional biobanks become available, efficient information-sharing strategies with data privacy considerations are needed for combining data across biobanks to improve sample diversity and sample size. This proposal will address these needs by developing a methodological framework with advanced transfer learning (TL) and federated learning (FL) techniques for integrating various sources of data to bridge the gap of risk prediction across populations. Specifically, in Aim 1, we will develop a TL method to integrate ancestrally diverse data based on high-dimensional models with a distance-based regularization to characterize the similarities across populations, and a communication-efficient FL algorithm that jointly fits the TL model across multiple biobanks with only summary-level statistics. In Aim 2, we will develop methods that enable joint analyses of multiple phenotypes in association tests and risk prediction models. We will develop an FL algorithm to combine data from multiple biobanks for cross-phenotype association test, and a TL method with an angle-based regularization to leverage genetic correlations among mixed types of phenotypes in risk prediction. In Aim 3, we will develop knowledge- graph-based TL methods that leverage the shared latent spaces between phenotype-genotype knowledge graphs constructed from different ancestral populations and enable the incorporation of functional annotations. In Aim 4, we will develop open-access statistical software capable of implementing the proposed methods in both offline and cloud computing environments, and apply the proposed methods to the analysis of major depressive disorder and cardiovascular diseases using data from the All of Us program, eMERGE, and the UK biobank.

摘要