Computational Methods for Next-Generation GWAS
下一代 GWAS 的计算方法
基本信息
- 批准号:9910009
- 负责人:
- 金额:$ 1.93万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-05-01 至 2020-07-31
- 项目状态:已结题
- 来源:
- 关键词:AgricultureBenchmarkingBiologyBreedingCommunitiesComputer softwareComputing MethodologiesCoupledCulicidaeDNA SequenceDataDimensionsEnvironmentEvolutionGene FrequencyGeneticGenotypeGeographic LocationsGoalsGuidelinesHaplotypesHealthHeart DiseasesHeightHumanImageLearningLinear ModelsLinear RegressionsLinkMachine LearningMeasuresMethodologyMethodsModelingNon-Insulin-Dependent Diabetes MellitusOligogenic TraitsOutputPerformancePhenotypePolygenic TraitsPopulationPopulation GeneticsPopulation HeterogeneityPositioning AttributeProcessPublic HealthRunningSamplingSignal TransductionSpatial DistributionStratificationStructureSumTechniquesTestingTrainingTrans-Omics for Precision MedicineVariantautoencoderbasebiobankcohortdeep learningdeep neural networkdiverse dataexperiencegenome wide association studygenome-widegenomic datahuman dataimage reconstructionimprovedlarge scale simulationlearning strategymachine learning algorithmneural networknext generationpolygenic risk scorepopulation stratificationsimulationstatisticssupervised learningtooltrait
项目摘要
Project Summary/Abstract
Predicting phenotypes from DNA sequence variation is a major goal for genetics with potential
applications in evolutionary biology, crop breeding, and public health. A central challenge in this task is
separating genetic and environmental effects on phenotypes. In natural populations breeding structure is often
correlated with the environment across space such that different subpopulations experience different
environments. For genome-wide association studies (GWAS) this creates a problem: genetic and
environmental effects can be confounded by population structure, leading to inflated test statistics and low
predictive power across populations (Bulik-Sullivan et al. 2015, Mathieson and Mcvean, 2012). Understanding
when association studies are biased by population stratification and creating better methods to correct for it are
thus important challenges for population genetics over the next decade.
To identify conditions under which existing methods of population stratification correction are subject to
bias and develop robust new alternatives suitable for use with the continental-scale genomic datasets that are
now routinely available for humans, we propose to use simulations and machine learning to separate the
signals of fine-scale ancestry from polygenic phenotype association. In our first aim we will develop simulations
of polygenic phenotype evolution in continuous space and use the output to evaluate existing methods of
stratification control including linear mixed models, PC correction, and LD score regression. In this aim we will
seek to identify the regions of parameter space – i.e. the strength of isolation by distance and the spatial
distribution of environmental variation – in which existing methods can be expected to produce reliable effect
size estimates, and establish guidelines for applications of GWAS to structured populations.
We will then train machine learning algorithms on real genotype data from humans and mosquitoes to
describe continuous structure in large spatial samples using a variational autoencoder, a dimensionality
reduction technique based on deep neural networks that can take advantage of both allele frequency and
haplotype-based measures of differentiation in a single analysis and thus offer improved control of stratification
inflation in GWAS relative to the now standard PCA regression approach. Last we will apply deep learning
techniques to the problem of linking phenotypes and genotypes in structured samples by training neural
networks on simulated phenotypes and empirical genetic data. By training our networks on empirical genetic
data and incorporating contextual information about surrounding haplotype structure into the model, our
networks should learn to discriminate causal associations from false positives created by population structure
in the sample cohort, which will improve performance when attempting to identify associations with the real
phenotype. These methods will be applied to existing genomic datasets of height in humans, tested against the
current state-of-the-art approaches, and packaged as scalable software for the broader scientific community.
项目摘要/摘要
从DNA序列变异预测表型是有潜力的遗传学的主要目标
在进化生物学、作物育种和公共卫生方面的应用。这项任务的一个核心挑战是
分离遗传和环境对表型的影响。在自然种群中,繁殖结构通常是
与空间中的环境相关,不同的亚群经历不同的
环境。对于全基因组关联研究(GWAS)来说,这产生了一个问题:遗传和
环境影响可能会被种群结构混淆,从而导致夸大的测试统计和低
跨人群的预测能力(Bulik-Sullivan等人)。2015年,Mathieson和McVean,2012年)。理解
当关联研究因人口分层而产生偏见并创造更好的方法来纠正它时
因此,人口遗传学在未来十年将面临重大挑战。
确定现有的人口分层校正方法在哪些条件下受到限制
偏向并开发适合与大陆规模基因组数据集一起使用的可靠的新替代品
现在对人类来说是常规的,我们建议使用模拟和机器学习来分离
来自多基因表型组合的精细祖先信号。在我们的首要目标中,我们将开发模拟
在连续空间中的多基因表型进化,并使用输出来评估现有的方法
分层控制包括线性混合模型、PC校正和LD分数回归。为了实现这一目标,我们将
寻求识别参数空间的区域--即通过距离和空间隔离的强度
环境变化的分布--其中现有方法可以预期产生可靠的效果
规模估计,并制定将全球气候变化系统应用于结构化人口的指导方针。
然后,我们将根据来自人类和蚊子的真实基因数据训练机器学习算法,以
使用变分自动编码器描述大空间样本中的连续结构,维度
一种既能利用等位基因频率又能利用基因频率的深度神经网络约简技术
在单一分析中基于单倍型的差异性测量,从而提供更好的分层控制
与现在标准的主成分分析回归方法相比,GWA中的通货膨胀。最后,我们将应用深度学习
通过训练神经网络解决结构化样本中表型和基因型关联问题的技术
基于模拟表型和经验遗传数据的网络。通过对我们的网络进行经验遗传培训
数据,并将有关周围单倍型结构的上下文信息整合到模型中,我们的
网络应该学会区分因果关联和人口结构造成的假阳性
在样本队列中,这将在尝试识别与实际的关联时提高性能
表型。这些方法将应用于现有的人类身高基因组数据集,并与
当前最先进的方法,并打包为可扩展的软件,供更广泛的科学界使用。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher J Battey其他文献
Christopher J Battey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
企业绩效评价的DEA-Benchmarking方法及动态博弈研究
- 批准号:70571028
- 批准年份:2005
- 资助金额:16.5 万元
- 项目类别:面上项目
相似海外基金
An innovative EDI data, insights & peer benchmarking platform enabling global business leaders to build data-led EDI strategies, plans and budgets.
创新的 EDI 数据、见解
- 批准号:
10100319 - 财政年份:2024
- 资助金额:
$ 1.93万 - 项目类别:
Collaborative R&D
BioSynth Trust: Developing understanding and confidence in flow cytometry benchmarking synthetic datasets to improve clinical and cell therapy diagnos
BioSynth Trust:发展对流式细胞仪基准合成数据集的理解和信心,以改善临床和细胞治疗诊断
- 批准号:
2796588 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Studentship
Collaborative Research: SHF: Medium: A Comprehensive Modeling Framework for Cross-Layer Benchmarking of In-Memory Computing Fabrics: From Devices to Applications
协作研究:SHF:Medium:内存计算结构跨层基准测试的综合建模框架:从设备到应用程序
- 批准号:
2347024 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Standard Grant
Elements: CausalBench: A Cyberinfrastructure for Causal-Learning Benchmarking for Efficacy, Reproducibility, and Scientific Collaboration
要素:CausalBench:用于因果学习基准测试的网络基础设施,以实现有效性、可重复性和科学协作
- 批准号:
2311716 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Standard Grant
Benchmarking collisional rates and hot electron transport in high-intensity laser-matter interaction
高强度激光-物质相互作用中碰撞率和热电子传输的基准测试
- 批准号:
2892813 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Studentship
Collaborative Research: BeeHive: A Cross-Problem Benchmarking Framework for Network Biology
合作研究:BeeHive:网络生物学的跨问题基准框架
- 批准号:
2233969 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Continuing Grant
FET: Medium: Quantum Algorithms, Complexity, Testing and Benchmarking
FET:中:量子算法、复杂性、测试和基准测试
- 批准号:
2311733 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Continuing Grant
Establishing and benchmarking advanced methods to comprehensively characterize somatic genome variation in single human cells
建立先进方法并对其进行基准测试,以全面表征单个人类细胞的体细胞基因组变异
- 批准号:
10662975 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Collaborative Research: BeeHive: A Cross-Problem Benchmarking Framework for Network Biology
合作研究:BeeHive:网络生物学的跨问题基准框架
- 批准号:
2233968 - 财政年份:2023
- 资助金额:
$ 1.93万 - 项目类别:
Continuing Grant