High-performance mixed model toolset for integrative omics analysis of big data
用于大数据综合组学分析的高性能混合模型工具集
基本信息
- 批准号:9312511
- 负责人:
- 金额:$ 58.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-04-15 至 2020-03-31
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAttentionBedsBig DataBiologicalBiological ModelsBiologyCloud ComputingCodeCollaborationsCommunitiesComplexComplex Genetic TraitComputer softwareDataData AnalysesData SetDevelopmentDiseaseEpigenetic ProcessEquationFamilyGeneticGenetic EpistasisGenomicsGenotypeGoalsHealthHeterogeneityHumanInvestmentsMemoryMeta-AnalysisMethodologyMitochondriaModelingNational Heart, Lung, and Blood InstituteNational Human Genome Research InstitutePerformancePhasePhenotypePlayPopulationProcessPublishingResearchResearch PersonnelResource AllocationResourcesRoleSample SizeSamplingSequence AnalysisShapesSystemSystems BiologyTechnologyTestingTimeTrans-Omics for Precision MedicineVariantWeightWorkanalytical methodanimal breedingbasebiological systemscloud basedcohortcostdata accessdata managementdata spaceepigenome-wide association studiesfile formatflexibilitygenetic analysisgenetic pedigreegenomic dataimprovedinsightlarge scale productionmethod developmentnovelnovel strategiesprecision medicinepressurerare variantresponsescale upsimulationsimulation softwareterabytetooltraitvirtualwhole genomeworking group
项目摘要
PROJECT SUMMARY/ABSTRACT
The recent large scale production of whole genome sequence and other multi-omics in TOPMed and other
projects calls for parallel development of comprehensive, powerful and flexible toolset capable of large data
management, analysis and integration. Mega/integrated analyses are essential to fully utilize these data to
elucidate the complexity of the biological mechanisms and advance our understanding of complex trait biology to
drive precision medicine. TOPMed estimates that the VCF for 60,000 subjects will contain 400M variants and
require 100TB of space, and much of our current genetic analysis toolset does not scale up to these data sizes.
For rare variant analysis, mixed model mega analysis is more powerful than meta-analysis as mega analysis can
include additional random effects to account for genetic relatedness between all subjects and cross-study
phenotypic, genetic and environmental heterogeneity. However cross-study mega analysis within the mixed
model is still an uncharted territory. We believe mega analysis will spur more creative analysis approaches
provided the needed toolsets are available. In cloud computing “time is money”, and new approaches are
required to solve structural differences in resource allocation and data access compared to local computing.
MMAP (Mixed Models for Analysis of Pedigrees/Populations) is robust mixed model software that already
published mixed model analysis on a sample size of 90,000 that included dominance variance and developed a
cloud-efficient version of mixed model rare variant analysis. The goal of this proposal is to further expand and
improve this toolset to deliver to the research community a flexible, versatile, and comprehensive cross-platform
mixed model toolset scalable to efficient local and cloud analysis of large WGS and omics data. We plan to
implement several new features in our toolset including: 1) Efficient binary genotype file format for optimal
storage of terabyte VCF genotypes. 2) Large-scale modeling of non-additive variation such as dominance, X-
lined, mitochondrial and epistasis. 3) Optimized rare variant analysis with flexible integration of annotation and
variant weighting resources. 4) Optimized expression/epigenome-wide association (EWA) analysis. 5)
Comprehensive multi-omics integration into the mixed model as fixed and random effects. 6) Development of a
multi-omics simulation software to guide systems biology modeling. 7) Integrating mixed model equations for
prediction from animal breeding. This proposal will deliver the research community an analysis toolset that will
push research boundaries well beyond additive SNP association to a space filled with complex biological fixed
and random effects models integrating the full spectrum of multi-omics data. We plan to develop a multi-omics
simulation tool to better understand the complex evolutionary processes that shape the complex trait landscape.
Our toolset will be extensively shaped by collaboration with TOPMed working groups to meet analysis priorities
and develop analysis plans. Our toolset will surely evolve in novel and unexpected directions in response to new
ideas and challenges as we dive deeper into this unique data set.
项目总结/摘要
最近大规模生产的全基因组序列和其他多组学在TOPMed和其他
项目要求并行开发能够处理大数据的全面、强大和灵活的工具集
管理、分析和整合。大型/综合分析对于充分利用这些数据至关重要,
阐明生物机制的复杂性,并推进我们对复杂性状生物学的理解,
推动精准医疗TOPMed估计,6万名受试者的VCF将包含4亿个变体,
需要100 TB的空间,而我们目前的大部分遗传分析工具集都无法扩展到这些数据大小。
对于罕见变异分析,混合模型的大分析比荟萃分析更强大,因为大分析可以
包括额外随机效应以解释所有受试者和交叉研究之间的遗传相关性
表型、遗传和环境异质性。然而,混合内部的交叉研究大型分析
模型仍然是一个未知的领域。我们相信大分析将激发更多创造性的分析方法
前提是所需的工具集可用。在云计算中,“时间就是金钱”,新的方法
与本地计算相比,需要解决资源分配和数据访问的结构差异。
MMAP(用于谱系/群体分析的混合模型)是一个强大的混合模型软件,
发表了对90,000个样本量的混合模型分析,其中包括显性方差,并开发了一个
混合模型罕见变异分析的云高效版本。该提案的目标是进一步扩大和
改进此工具集,为研究社区提供灵活、通用和全面的跨平台
可扩展的混合模型工具集,可对大型WGS和组学数据进行高效的本地和云分析。我们计划
在我们的工具集中实现了几个新功能,包括:1)高效的二进制基因型文件格式,
存储TB级VCF基因型。2)非加性变异的大规模建模,如显性、X-
线粒体和上位性。3)优化的罕见变异分析,灵活集成注释和
不同的权重资源。4)优化表达/表观基因组全关联(EWA)分析。第五章)
综合多组学整合到混合模型中作为固定和随机效应。6)发展
多组学模拟软件指导系统生物学建模。7)混合模型方程的积分
动物育种的预测。该提案将为研究界提供一个分析工具集,
将研究边界远远超出添加剂SNP关联,进入充满复杂生物固定的空间。
和随机效应模型整合了多组学数据的全谱。我们计划开发一个多组学
模拟工具,以更好地了解复杂的进化过程,塑造复杂的性状景观。
我们的工具集将通过与TOPMed工作组的合作进行广泛的塑造,以满足分析优先级
制定分析计划。我们的工具集肯定会朝着新的和意想不到的方向发展,
想法和挑战,因为我们深入研究这个独特的数据集。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
JEFFREY R O'CONNELL其他文献
JEFFREY R O'CONNELL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('JEFFREY R O'CONNELL', 18)}}的其他基金
Elucidating the ancestry-specific genetic and environmental architecture of cardiometabolic traits across All of Us ethnic groups
阐明我们所有种族群体心脏代谢特征的祖先特异性遗传和环境结构
- 批准号:
10796028 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7104529 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7246523 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
Genome-wide Association in Families: Data Integrity, Design and Methods Issue
家庭全基因组关联:数据完整性、设计和方法问题
- 批准号:
7421072 - 财政年份:2006
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
2864800 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
6043142 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
RAPID MULTIPOINT METHODS FOR MAPPING COMPLEX DISEASES
用于绘制复杂疾病图谱的快速多点方法
- 批准号:
6169588 - 财政年份:1998
- 资助金额:
$ 58.48万 - 项目类别:
相似国自然基金
多模态超声VisTran-Attention网络评估早期子宫颈癌保留生育功能手术可行性
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
Ultrasomics-Attention孪生网络早期精准评估肝内胆管癌免疫治疗的研究
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:面上项目
相似海外基金
Development of social attention indicators of emerging technologies and science policies with network analysis and text mining
利用网络分析和文本挖掘开发新兴技术和科学政策的社会关注指标
- 批准号:
24K16438 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Improving Flexible Attention to Numerical and Spatial Magnitudes in Young Children
提高幼儿对数字和空间大小的灵活注意力
- 批准号:
2410889 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Continuing Grant
The Information-Attention Tradeoff: Toward an Understanding of the Fundamentals of Online Attention
信息与注意力的权衡:了解在线注意力的基本原理
- 批准号:
2343858 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Continuing Grant
The everyday learning opportunities of young children with attention and motor difficulties: From understanding constraints to reshaping intervention
注意力和运动困难幼儿的日常学习机会:从理解限制到重塑干预
- 批准号:
MR/X032922/1 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Fellowship
Towards a cognitive process model of how attention and choice interact
建立注意力和选择如何相互作用的认知过程模型
- 批准号:
DP240102605 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Discovery Projects
DDRIG in DRMS: Communicating risks in a sensational media environment-Using short video multimodal features to attract attention and reduce psychological reactance for persuasion
DRMS中的DDRIG:耸人听闻的媒体环境中沟通风险——利用短视频多模态特征吸引注意力,减少说服心理抵触
- 批准号:
2343506 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Standard Grant
Assessing the Influence of Reading Fiction on Multiple Tests of Attention
评估阅读小说对注意力多重测试的影响
- 批准号:
24K16033 - 财政年份:2024
- 资助金额:
$ 58.48万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Attention機構に基づく異種集合マッチング方式の分析と新方式の提案
基于注意力机制的异构集合匹配方法分析及新方法的提出
- 批准号:
23K11218 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Effects of instruction using focus of attention on performance of chest compressions.
使用注意力集中的教学对胸外按压表现的影响。
- 批准号:
23K09887 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Assessing the Influence of SDGs Formulation on Managers' Perceptions and CSR Activities: An Attention-based View
评估可持续发展目标制定对管理者认知和企业社会责任活动的影响:基于注意力的观点
- 批准号:
23K01515 - 财政年份:2023
- 资助金额:
$ 58.48万 - 项目类别:
Grant-in-Aid for Scientific Research (C)