Federated and transfer learning methods for cross-ancestry and cross-phenotype integration of genomic datasets
用于基因组数据集跨血统和跨表型整合的联合和迁移学习方法
基本信息
- 批准号:10564023
- 负责人:
- 金额:$ 43.15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-12-20 至 2027-11-30
- 项目状态:未结题
- 来源:
- 关键词:AccountingAddressAfrican ancestryAlgorithmsAll of Us Research ProgramAssessment toolBiologicalCardiovascular DiseasesCloud ComputingCommunicationCommunity MedicineComplexComputer softwareDataData CollectionData CommonsDiseaseDisparityDisparity populationElectronic Health RecordEnvironmentEuropean ancestryGeneticGenetic RiskGenotypeGoalsHealthcareHeritabilityHeterogeneityHigh Performance ComputingInequityInstitutionJointsKnowledgeLearningLinkMajor Depressive DisorderMental disordersMeta-AnalysisMethodologyMethodsMinority GroupsModelingParticipantPatientsPerformancePhenotypePopulationPopulation HeterogeneityRare DiseasesReproducibilityRisk AssessmentRisk FactorsSample SizeSamplingSourceTechniquesTestingTrans-Omics for Precision MedicineTranslatingUnderrepresented PopulationsUnited States National Institutes of HealthVariantbiobankclinical riskcloud platformcluster computingcomputerized toolscost effectivedata integrationdata privacydisorder riskdiverse datafederated learninggenetic architecturegenome wide association studygenomic datagraph knowledge basehealth disparityhigh dimensionalityimprovedinsightknowledge graphlearning algorithmlearning strategymulti-ethnicmulti-task learningnovelopen sourceparticipant enrollmentpleiotropismpolygenic risk scoreprecision medicineprivacy preservationprivacy protectionprogramsrisk predictionrisk prediction modelrisk stratificationstatisticstraittransfer learninguser friendly software
项目摘要
Abstract
This proposal aims to develop advanced data integration methods for improving genetic risk prediction
in under-represented non-European populations. Genome-Wide Association Studies (GWAS) have yielded
important biological insights into the heritable basis of many complex traits and diseases, and polygenic risk
scores (PRS) have shown promising potentials for disease risk stratification. However, since the vast majority of
participants in large-scale genomic datasets are from European ancestry (EA) populations, the performance of
current PRS is much poorer in non-EA populations than in EA populations, which may exacerbate existing health
disparities. Despite some recent inclusive data collection efforts, current risk prediction methods cannot effectively
address the heavily unbalanced sample sizes across populations. Robust data integration methods are needed to
leverage similarities in genetic architectures across ancestral populations, phenotypic correlations and pleiotropy,
and variant functional annotations while accounting for different sources of heterogeneity. Moreover, as various
national and institutional biobanks become available, efficient information-sharing strategies with data privacy
considerations are needed for combining data across biobanks to improve sample diversity and sample size. This
proposal will address these needs by developing a methodological framework with advanced transfer learning (TL)
and federated learning (FL) techniques for integrating various sources of data to bridge the gap of risk prediction
across populations. Specifically, in Aim 1, we will develop a TL method to integrate ancestrally diverse data
based on high-dimensional models with a distance-based regularization to characterize the similarities across
populations, and a communication-efficient FL algorithm that jointly fits the TL model across multiple biobanks with
only summary-level statistics. In Aim 2, we will develop methods that enable joint analyses of multiple phenotypes
in association tests and risk prediction models. We will develop an FL algorithm to combine data from multiple
biobanks for cross-phenotype association test, and a TL method with an angle-based regularization to leverage
genetic correlations among mixed types of phenotypes in risk prediction. In Aim 3, we will develop knowledge-
graph-based TL methods that leverage the shared latent spaces between phenotype-genotype knowledge graphs
constructed from different ancestral populations and enable the incorporation of functional annotations. In Aim 4,
we will develop open-access statistical software capable of implementing the proposed methods in both offline
and cloud computing environments, and apply the proposed methods to the analysis of major depressive disorder
and cardiovascular diseases using data from the All of Us program, eMERGE, and the UK biobank.
抽象的
该建议旨在开发高级数据集成方法来改善遗传风险预测
在代表性不足的非欧洲人口中。全基因组关联研究(GWAS)已产生
对许多复杂特征和疾病以及多基因风险的可遗传基础的重要生物学见解
分数(PR)显示出有希望的疾病风险分层的潜力。但是,由于绝大多数
大规模基因组数据集的参与者来自欧洲血统(EA)人群,表现
目前的非AEA人群中的PR比EA人群差得多,EA可能会加剧现有健康状况
差异。尽管最近有一些包容性数据收集工作,但当前的风险预测方法无法有效
解决跨种群的严重不平衡样本量。需要强大的数据集成方法
利用祖先人群,表型相关性和多效性的遗传体系结构的相似性,
在考虑不同来源的异质性来源的同时,变体功能注释。而且,作为各种各样的
具有数据隐私的国家和机构生物库可用,有效的信息共享策略
需要考虑将数据组合到生物库中以改善样本多样性和样本量。这
提案将通过开发高级转移学习(TL)的方法学框架来满足这些需求
和联合学习(FL)技术,用于整合各种数据来源以弥合风险预测差距
跨种群。具体而言,在AIM 1中,我们将开发一种TL方法来整合祖先多样化的数据
基于具有基于距离正规化的高维模型,以表征
种群和一种沟通高效的FL算法,该算法与多个生物库共同拟合TL模型
仅摘要级统计数据。在AIM 2中,我们将开发能够对多种表型进行联合分析的方法
在关联测试和风险预测模型中。我们将开发一种FL算法来组合来自多个的数据
用于跨表型关联测试的生物库,以及具有基于角度正则化的TL方法
风险预测中混合表型混合类型的遗传相关性。在AIM 3中,我们将发展知识 -
基于图的TL方法,利用表型基因型知识图之间的共享潜在空间
由不同的祖先人群构建,并能够纳入功能注释。在AIM 4中,
我们将开发开放访问统计软件,能够在两个离线中实现所提出的方法
和云计算环境,并将提出的方法应用于重度抑郁症的分析
和心血管疾病,使用我们所有计划,Emerge和UK Biobank的数据。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rui Duan其他文献
Rui Duan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Genetic and Environmental Influences on Individual Sweet Preference Across Ancestry Groups in the U.S.
遗传和环境对美国不同血统群体个体甜味偏好的影响
- 批准号:
10709381 - 财政年份:2023
- 资助金额:
$ 43.15万 - 项目类别:
A Mobile-Delivered Personalized Feedback Intervention for Black Individuals who Engage in Hazardous Drinking
针对有害饮酒的黑人的移动提供的个性化反馈干预
- 批准号:
10821512 - 财政年份:2023
- 资助金额:
$ 43.15万 - 项目类别:
BridgePRS: bridging the gap in polygenic risk scores between ancestries.
BridgePRS:缩小祖先之间多基因风险评分的差距。
- 批准号:
10737057 - 财政年份:2023
- 资助金额:
$ 43.15万 - 项目类别:
Innovative Deep Phenotyping of African Americans at Risk for Alzheimers disease
对有阿尔茨海默病风险的非裔美国人进行创新性深层表型分析
- 批准号:
10662056 - 财政年份:2023
- 资助金额:
$ 43.15万 - 项目类别:
Environmental Moderation of Genetic Influences on Dementia Risk in Mexican Older Adults
环境调节基因对墨西哥老年人痴呆风险的影响
- 批准号:
10607226 - 财政年份:2023
- 资助金额:
$ 43.15万 - 项目类别: