Federated and transfer learning methods for cross-ancestry and cross-phenotype integration of genomic datasets
用于基因组数据集跨血统和跨表型整合的联合和迁移学习方法
基本信息
- 批准号:10564023
- 负责人:
- 金额:$ 43.15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-12-20 至 2027-11-30
- 项目状态:未结题
- 来源:
- 关键词:AccountingAddressAfrican ancestryAlgorithmsAll of Us Research ProgramAssessment toolBiologicalCardiovascular DiseasesCloud ComputingCommunicationCommunity MedicineComplexComputer softwareDataData CollectionData CommonsDiseaseDisparityDisparity populationElectronic Health RecordEnvironmentEuropean ancestryGeneticGenetic RiskGenotypeGoalsHealthcareHeritabilityHeterogeneityHigh Performance ComputingInequityInstitutionJointsKnowledgeLearningLinkMajor Depressive DisorderMental disordersMeta-AnalysisMethodologyMethodsMinority GroupsModelingParticipantPatientsPerformancePhenotypePopulationPopulation HeterogeneityRare DiseasesReproducibilityRisk AssessmentRisk FactorsSample SizeSamplingSourceTechniquesTestingTrans-Omics for Precision MedicineTranslatingUnderrepresented PopulationsUnited States National Institutes of HealthVariantbiobankclinical riskcloud platformcluster computingcomputerized toolscost effectivedata integrationdata privacydisorder riskdiverse datafederated learninggenetic architecturegenome wide association studygenomic datagraph knowledge basehealth disparityhigh dimensionalityimprovedinsightknowledge graphlearning algorithmlearning strategymulti-ethnicmulti-task learningnovelopen sourceparticipant enrollmentpleiotropismpolygenic risk scoreprecision medicineprivacy preservationprivacy protectionprogramsrisk predictionrisk prediction modelrisk stratificationstatisticstraittransfer learninguser friendly software
项目摘要
Abstract
This proposal aims to develop advanced data integration methods for improving genetic risk prediction
in under-represented non-European populations. Genome-Wide Association Studies (GWAS) have yielded
important biological insights into the heritable basis of many complex traits and diseases, and polygenic risk
scores (PRS) have shown promising potentials for disease risk stratification. However, since the vast majority of
participants in large-scale genomic datasets are from European ancestry (EA) populations, the performance of
current PRS is much poorer in non-EA populations than in EA populations, which may exacerbate existing health
disparities. Despite some recent inclusive data collection efforts, current risk prediction methods cannot effectively
address the heavily unbalanced sample sizes across populations. Robust data integration methods are needed to
leverage similarities in genetic architectures across ancestral populations, phenotypic correlations and pleiotropy,
and variant functional annotations while accounting for different sources of heterogeneity. Moreover, as various
national and institutional biobanks become available, efficient information-sharing strategies with data privacy
considerations are needed for combining data across biobanks to improve sample diversity and sample size. This
proposal will address these needs by developing a methodological framework with advanced transfer learning (TL)
and federated learning (FL) techniques for integrating various sources of data to bridge the gap of risk prediction
across populations. Specifically, in Aim 1, we will develop a TL method to integrate ancestrally diverse data
based on high-dimensional models with a distance-based regularization to characterize the similarities across
populations, and a communication-efficient FL algorithm that jointly fits the TL model across multiple biobanks with
only summary-level statistics. In Aim 2, we will develop methods that enable joint analyses of multiple phenotypes
in association tests and risk prediction models. We will develop an FL algorithm to combine data from multiple
biobanks for cross-phenotype association test, and a TL method with an angle-based regularization to leverage
genetic correlations among mixed types of phenotypes in risk prediction. In Aim 3, we will develop knowledge-
graph-based TL methods that leverage the shared latent spaces between phenotype-genotype knowledge graphs
constructed from different ancestral populations and enable the incorporation of functional annotations. In Aim 4,
we will develop open-access statistical software capable of implementing the proposed methods in both offline
and cloud computing environments, and apply the proposed methods to the analysis of major depressive disorder
and cardiovascular diseases using data from the All of Us program, eMERGE, and the UK biobank.
摘要
该提案旨在开发先进的数据整合方法,以改善遗传风险预测
在非欧洲人口中,全基因组关联研究(GWAS)
对许多复杂性状和疾病的遗传基础以及多基因风险的重要生物学见解
评分(PRS)已显示出疾病风险分层的良好潜力。然而,由于绝大多数
大规模基因组数据集的参与者来自欧洲血统(EA)人群,
目前的PRS在非EA人群中比EA人群差得多,这可能会加剧现有的健康状况
差距。尽管最近进行了一些包容性的数据收集工作,但目前的风险预测方法不能有效地
解决人口之间样本量严重不平衡的问题。需要强大的数据集成方法,
利用祖先群体之间遗传结构的相似性、表型相关性和多效性,
和变体功能注释,同时考虑异质性的不同来源。此外,由于各种
国家和机构生物库成为可用的,有效的信息共享战略,数据隐私
需要考虑合并生物库的数据,以提高样本多样性和样本量。这
该提案将通过开发一个具有高级迁移学习(TL)的方法框架来满足这些需求
以及联邦学习(FL)技术,用于整合各种数据源,以弥合风险预测的差距
在人群中。具体而言,在目标1中,我们将开发一种TL方法来整合祖先多样性数据
基于高维模型,采用基于距离的正则化来表征
群体,以及通信高效的FL算法,该算法联合拟合多个生物库的TL模型,
只是统计数据的摘要。在目标2中,我们将开发能够联合分析多种表型的方法
关联测试和风险预测模型。我们将开发一种FL算法,以联合收割机从多个
用于交叉表型关联测试的生物库,以及具有基于角度的正则化的TL方法,以利用
风险预测中混合型表型间的遗传相关性。在目标3中,我们将开发知识-
基于图的TL方法,利用表型-基因型知识图之间的共享潜在空间
从不同的祖先群体构建,并能够纳入功能注释。在目标4中,
我们将开发开放获取的统计软件,能够在离线和离线两种情况下实施所提出的方法。
和云计算环境,并将所提出的方法应用于重性抑郁症的分析
和心血管疾病的研究,使用的数据来自All of Us项目、eMERGE和英国生物银行。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rui Duan其他文献
Rui Duan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
- 批准号:
MR/S03398X/2 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
- 批准号:
EP/Y001486/1 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
- 批准号:
2338423 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
- 批准号:
MR/X03657X/1 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
- 批准号:
2348066 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
- 批准号:
2341402 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
- 批准号:
AH/Z505481/1 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10107647 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10106221 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
- 批准号:
AH/Z505341/1 - 财政年份:2024
- 资助金额:
$ 43.15万 - 项目类别:
Research Grant














{{item.name}}会员




