Statistical Methods for Integrative Analysis of Large-Scale Multi-Ethnic Whole Genome Sequencing Studies and Biobanks of Common Diseases
大规模多民族全基因组测序研究和常见疾病生物样本库综合分析的统计方法
基本信息
- 批准号:10622567
- 负责人:
- 金额:$ 49.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-15 至 2026-04-30
- 项目状态:未结题
- 来源:
- 关键词:AddressBiological MarkersCellsCellular AssayCloud ComputingCodeComplexComputer softwareComputing MethodologiesDataData SetDiseaseElectronic Health RecordEnvironmentEpidemiologyEuropeanEuropean ancestryFAIR principlesFaceGeneticGenetic ResearchGenetic studyGenomeHeart DiseasesIndividualInstitutionInterventionLung diseasesMendelian randomizationMeta-AnalysisMethodsModelingNational Heart, Lung, and Blood InstituteNational Human Genome Research InstituteNaturePathway interactionsPerformancePopulationPrevention strategyPrincipal Component AnalysisResearchRisk FactorsSample SizeSamplingStatistical MethodsStructureSystemTestingTrans-Omics for Precision MedicineUnderrepresented PopulationsUnited States National Institutes of HealthUntranslated RNAVariantVeteransbiobankcatalystcell typecloud platformcluster computingdata privacydisorder riskempowermentexomegenetic variantgenome sequencinggenome wide association studyhealth disparityimprovedinterestlearning strategymulti-ethnicpolygenic risk scorepower analysisprivacy protectionprogramsrare variantresearch studyrisk predictionstatistical learningtraituser friendly softwarewhole genome
项目摘要
This proposal aims to develop advanced and scalable statistical methods for integrative analysis of large-scale
Whole Genome Sequencing (WGS) studies and biobanks of common diseases, such as heart and lung
diseases. Genome-Wide Association Studies (GWAS) have revealed thousands of genetic variants associated
with many common diseases, but are limited to common variants from a majority of individuals of only
European ancestry. Large-scale multi-ethnic WGS studies and biobanks have been rapidly arising to overcome
these limitations, and to study the genetic underpinnings of complex diseases and traits in both coding and
non-coding rare variants across populations. Examples include the NHLBI Trans-Omics Precision Medicine
Program (TOPMed) and the NHGRI Genome Sequencing Program (GSP), UK biobank, and All of Us. Various
omics data are also available in TOPMed. Full usage of these datasets can fuel genetic discoveries applicable
to genetically understudied populations. These studies consist of hundreds of millions of rare variants (RVs),
and their analysis faces several challenges. First, although several methods have been developed for RV
analysis, they have limited power for analysis of non-coding RVs, as their functions are unknown or cell-type
specific. There is a pressing need to empower RV Association Tests (RVATs) for non-coding variants by
developing more powerful statistical learning methods using integrative analysis and incorporating cell-type
specific variant functional annotations. Second, large sample sizes of WGS studies and data privacy
consideration of many national and institutional biobanks with unbalanced case and control ratios call for
distributed WGS analyses. Third, it is of substantial interest to develop polygenic risk scores using both
common and rare variants in WGS studies, and to investigate causal effects of biomarkers and omics’ markers
on diseases using Mendelian Randomization (MR) using both common and rare variants as instrumental
variables. This proposal aims at addressing these needs with four aims. First, we will develop statistical
learning based ensemble RVATs to boost power. This ensemble RVAT framework will be extended to use
cell-type-specific functional annotations calculated from single-cell assays, and to perform meta-analysis.
Second, we will develop distributed methods for important tasks in the analysis of large WGS and federated
biobank data: estimating population structure via distributed fast principal component analysis, distributed
methods for fitting generalized linear mixed models, and distributed RVATs. Third, we will develop methods for
polygenic risk score (PRS) using both common and rare variants in WGS studies, and develop Mendelian
Randomization methods for studying the causal effects of biomarkers and omics markers on diseases by using
WGS-based PRs as instrumental variables. Fourth, we will develop open-access statistical software capable of
implementing our proposed methods in both offline and cloud computing environments. We will apply the
proposed methods to the analysis of the TOPMed and GSP data and the biobanks.
该提案旨在开发先进的和可扩展的统计方法,用于大规模的综合分析,
全基因组测序(WGS)研究和常见疾病的生物库,如心脏和肺
疾病全基因组关联研究(GWAS)已经揭示了数千种遗传变异,
与许多常见的疾病,但仅限于常见的变种,从大多数人只有
欧洲血统。大规模的多种族WGS研究和生物库已经迅速兴起,以克服
这些限制,并研究复杂疾病的遗传基础和编码和
非编码的罕见变异。例如,NHLBI Trans-Omics Precision Medicine
计划(TOPMed)和NHGRI基因组测序计划(GSP),英国生物银行,和我们所有人。各种
组学数据也可在TOPMed中获得。充分利用这些数据集可以推动遗传发现
遗传学研究不足的人群。这些研究包括数以亿计的罕见变异(RV),
他们的分析面临着一些挑战。首先,尽管已经为RV开发了几种方法,
分析,它们对非编码RV的分析能力有限,因为它们的功能未知或细胞类型
特定.迫切需要通过以下方式为非编码变体授权RV关联测试(RVAT):
开发更强大的统计学习方法,使用综合分析和纳入细胞类型
特定变体功能注释。第二,WGS研究的大样本量和数据隐私
考虑到许多国家和机构的生物库具有不平衡的病例和对照比率,
分布式WGS分析。第三,使用这两种方法来开发多基因风险评分是很有意义的。
WGS研究中常见和罕见的变异,并调查生物标志物和组学标志物的因果效应
使用孟德尔随机化(MR),使用常见和罕见变异作为工具,
变量本提案旨在通过四个目标满足这些需求。首先,我们将统计
基于学习的集成RVAT以提高功率。该集成RVAT框架将扩展到使用
从单细胞测定计算的细胞类型特异性功能注释,并进行荟萃分析。
其次,我们将开发分布式方法,用于分析大型WGS和联邦
生物库数据:通过分布式快速主成分分析估计种群结构,分布式
广义线性混合模型的拟合方法和分布式RVAT。第三,我们将制定方法,
在WGS研究中使用常见和罕见变异的多基因风险评分(PRS),
使用随机化方法研究生物标志物和组学标志物对疾病的因果作用,
基于WGS的PR作为工具变量。第四,我们将开发开放获取的统计软件,
在离线和云计算环境中实现我们提出的方法。我们将应用
提出了TOPMed和GSP数据和生物库的分析方法。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
XIHONG LIN其他文献
XIHONG LIN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('XIHONG LIN', 18)}}的其他基金
Powering whole genome sequence-based genetic discovery for common human diseases- Extended 2021-2022.
为常见人类疾病提供基于全基因组序列的基因发现 - 延期 2021-2022 年。
- 批准号:
10355760 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
Powering whole genome sequence-based genetic discovery for common human diseases
为常见人类疾病提供基于全基因组序列的基因发现
- 批准号:
10085285 - 财政年份:2020
- 资助金额:
$ 49.98万 - 项目类别:
Powering whole genome sequence-based genetic discovery for common human diseases
为常见人类疾病提供基于全基因组序列的基因发现
- 批准号:
10168752 - 财政年份:2020
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9120850 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
10676866 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9321418 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9980301 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
9752258 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
癌症研究中大量遗传和基因组数据分析的统计方法
- 批准号:
8955524 - 财政年份:2015
- 资助金额:
$ 49.98万 - 项目类别:
相似国自然基金
评估miR-205在变应性鼻炎免疫治疗中的生物标志物潜力及靶向调控Th2分化的分子机制研究
- 批准号:2025JJ80626
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于OLINK蛋白组学及单细胞测序探索子痫前期患者外周血及母胎界面生物标志物的研究
- 批准号:2025JJ80658
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
间充质干细胞治疗急性呼吸窘迫综合征生物标志物研究
- 批准号:2025JJ90236
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
精准医疗中非小细胞肺癌(NSCLC)生物标志物的高灵敏检测及研究
- 批准号:2024Y9007
- 批准年份:2024
- 资助金额:50.0 万元
- 项目类别:省市级项目
急性T淋巴细胞白血病化疗后微小残留的预警生物标志物及干预药物
研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
卒中后抑郁的神经炎症生物标志物及机制研究
- 批准号:
- 批准年份:2024
- 资助金额:20.0 万元
- 项目类别:省市级项目
基于多组学分析的食管鳞癌生物标志物和药物靶点
发现
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
单细胞多组学解析类囊胚表观遗传特征及其生物标志物鉴定
- 批准号:
- 批准年份:2024
- 资助金额:15.0 万元
- 项目类别:省市级项目
基于Bacillus subtilis 细胞传感器介导的肠道环境中结直肠癌相关生物标志物的动态检测策略
- 批准号:82372355
- 批准年份:2023
- 资助金额:48 万元
- 项目类别:面上项目
吉兰-巴雷综合征潜在生物标志物APOC3通过介导代谢重编程调控巨噬细胞极化的分子机制研究
- 批准号:82371359
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
相似海外基金
MRI and Biological Markers of Acute E-Cigarette Exposure in Smokers and Vapers
吸烟者和电子烟使用者急性电子烟暴露的 MRI 和生物标志物
- 批准号:
10490338 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
MRI and Biological Markers of Acute E-Cigarette Exposure in Smokers and Vapers
吸烟者和电子烟使用者急性电子烟暴露的 MRI 和生物标志物
- 批准号:
10353104 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
Investigating pollution dynamics of swimming pool waters by means of chemical and biological markers
利用化学和生物标记物研究游泳池水体的污染动态
- 批准号:
21K04320 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
MRI and Biological Markers of Acute E-Cigarette Exposure in Smokers and Vapers
吸烟者和电子烟使用者急性电子烟暴露的 MRI 和生物标志物
- 批准号:
10688286 - 财政年份:2021
- 资助金额:
$ 49.98万 - 项目类别:
Novel biological markers for immunotherapy and comprehensive genetic analysis in thymic carcinoma
用于胸腺癌免疫治疗和综合遗传分析的新型生物标志物
- 批准号:
20K17755 - 财政年份:2020
- 资助金额:
$ 49.98万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Examination of Biological Markers Associated with Neurobehavioral and Neuropsychological Outcomes in Military Veterans with a History of Traumatic Brain Injury
与有脑外伤史的退伍军人的神经行为和神经心理结果相关的生物标志物的检查
- 批准号:
10578649 - 财政年份:2019
- 资助金额:
$ 49.98万 - 项目类别:
Examination of Biological Markers Associated with Neurobehavioral and Neuropsychological Outcomes in Military Veterans with a History of Traumatic Brain Injury
与有脑外伤史的退伍军人的神经行为和神经心理结果相关的生物标志物的检查
- 批准号:
10295141 - 财政年份:2019
- 资助金额:
$ 49.98万 - 项目类别:
Examination of Biological Markers Associated with Neurobehavioral and Neuropsychological Outcomes in Military Veterans with a History of Traumatic Brain Injury
与有脑外伤史的退伍军人的神经行为和神经心理结果相关的生物标志物的检查
- 批准号:
10041708 - 财政年份:2019
- 资助金额:
$ 49.98万 - 项目类别:
Examination of Biological Markers Associated with Neurobehavioral and Neuropsychological Outcomes in Military Veterans with a History of Traumatic Brain Injury
与有脑外伤史的退伍军人的神经行为和神经心理结果相关的生物标志物的检查
- 批准号:
9776149 - 财政年份:2019
- 资助金额:
$ 49.98万 - 项目类别:
Combining biological and non-biological markers to develop a model predictive of treatment response for individuals with depression
结合生物和非生物标志物来开发预测抑郁症患者治疗反应的模型
- 批准号:
2063934 - 财政年份:2018
- 资助金额:
$ 49.98万 - 项目类别:
Studentship