权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Information Explorer: a Suite of Tools for Cross-study Genetic Loci Discovery

信息浏览器：用于交叉研究遗传位点发现的一套工具

基本信息

批准号：
8145063
负责人：
Yigal Arens
金额：
$ 44.75万
依托单位：
UNIVERSITY OF SOUTHERN CALIFORNIA
依托单位国家：
美国
项目类别：
财政年份：
2011
资助国家：
美国
起止时间：
2011-07-19 至 2013-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8145063
关键词：
Archives Artificial Intelligence Authorization documentation Bioinformatics Biological Blood Cardiovascular Diseases Classification Complex Data Data Set Databases Development Documentation Epidemiologic Studies Foundations Future Genetic Genomics Genotype Heart Hereditary Disease Human Individual Informatics Information Retrieval Link Lung Machine Learning Maps Measurement Measures Meta-Analysis Methodology Methods Names Online Systems Outcome Performance Phenotype Postdoctoral Fellow Process Research Research Personnel Resources Single Nucleotide Polymorphism Sleep Solid Source System Systems Integration Techniques Technology Testing Time Training Validation Visit Work base cohort cost effective data structure database of Genotypes and Phenotypes experience gene environment interaction genetic association genetic epidemiology genetic variant genome wide association study graduate student improved interest meetings repository scale up software development symposium text searching tool trait usability web site

项目摘要

DESCRIPTION (provided by applicant): Databases such as dbGaP represent extremely valuable resources of data that have been assembled across multiple cohorts. The increasing development of cost-effective high-throughput genotyping and sequencing technologies are resulting in vast amounts of genetic data. While such databases were formed in order to archive and distribute the results of previously performed genetic association analyses, an increasing number of studies have provided de-identified individual-level genotypic and phenotypic data that are made available to outside researchers who have obtained the appropriate authorization. While the amount of data made available has increased dramatically in recent years, relatively little has been done in order to facilitate phenotype harmonization across studies. Many genetic epidemiologic studies of cardiovascular disease have multiple variables related to any given phenotype, resulting from different definitions and multiple measurements or subsets of data. A researcher searching such databases for the availability of phenotype and genotype combinations is confronted with a veritable mountain of variables to sift through. This often requires visiting multiple websites to gain additional information about variables that are listed on databases, and examination of data distributions to assess similarities across cohorts. While the naming strategy for genetic variants is largely standardized across studies (e.g. "rs" numbers for single nucleotide polymorphisms or SNPs), this is often not the case for phenotype variables. For a given study, there are often numerous versions of phenotypic variables. Researchers currently have to analyze and compare increasingly larger numbers of variables that have varying degrees of documentation associated with them to obtain the desired information. This is a time-consuming process that may still miss the most appropriate variables. Moreover, every researcher that wants to compare the same datasets often needs to start from scratch since there are no tools to share the phenotype comparison results. The availability of informatic tools to make phenotype mapping more efficient and improve its accuracy, along with intuitive phenotype query tools, would provide a major resource for researchers utilizing these databases. The tools we are proposing would allow researchers to (1) Quickly obtain the information needed to assess whether a specific study will be useful for the hypothesis of interest; (2) Exclude variables that do not meet research criteria; (3) Ascertain which studies have combinations of phenotype and genetic information of interest; and (4) More easily expand research questions beyond the most basic main-effects to more complex analyses such as gene-by-environment interactions and multivariate tests incorporating multiple phenotypes. The increased utility will also enable larger meta-analyses to be performed, as researchers will be able to more quickly hone in on outcomes, exclusionary variables and covariates of interest, leading to increased statistical power to detect genetic associations. PUBLIC HEALTH RELEVANCE: While the amount of genomic data (e.g., GWAS, sequencing, etc.) made available has increased dramatically in recent years, relatively little has been done in order to facilitate phenotype harmonization across studies. The tools we are proposing would allow researchers to quickly identify data sets of interest, expand research questions beyond the most basic main-effects to more complex analyses such as gene-by-environment interactions and multivariate test incorporating multiple phenotypes, and perform larger meta-analyses easily by honing in on outcomes, exclusionary variables and covariates of interest with increased statistical power to detect genetic associations.

描述（由申请人提供）：数据库（如dbGaP）代表了非常有价值的数据资源，这些数据资源是在多个队列中收集的。高通量基因分型和测序技术的不断发展产生了大量的遗传数据。虽然这些数据库的形成是为了存档和分发先前进行的遗传关联分析的结果，但越来越多的研究提供了去识别的个体水平的基因型和表型数据，这些数据可供获得适当授权的外部研究人员使用。虽然近年来可获得的数据量急剧增加，但在促进研究间表型协调方面所做的工作相对较少。许多心血管疾病的遗传流行病学研究具有与任何给定表型相关的多个变量，这是由于不同的定义和多个测量或数据子集。一个研究人员在这样的数据库中搜索表型和基因型组合的可用性，面临着一个名副其实的变量山筛选。这通常需要访问多个网站以获得有关数据库中列出的变量的更多信息，并检查数据分布以评估队列之间的相似性。虽然遗传变异的命名策略在很大程度上是跨研究标准化的（例如单核苷酸多态性或SNP的“rs”编号），但表型变量通常并非如此。对于一个给定的研究，往往有许多版本的表型变量。研究人员目前必须分析和比较越来越多的变量，这些变量具有不同程度的相关文档，以获得所需的信息。这是一个耗时的过程，仍然可能错过最合适的变量。此外，每个想要比较相同数据集的研究人员通常需要从头开始，因为没有工具可以共享表型比较结果。信息工具的可用性可使表型绘图更有效并提高其准确性，沿着直观的表型查询工具，将为利用这些数据库的研究人员提供重要资源。我们提出的工具将允许研究人员（1）快速获得评估特定研究是否对感兴趣的假设有用所需的信息;（2）排除不符合研究标准的变量;（3）确定哪些研究具有感兴趣的表型和遗传信息的组合;（4）更容易将研究问题从最基本的主效应扩展到更复杂的分析，如基因与环境的相互作用和包含多种表型的多变量测试。增加的效用也将使更大的荟萃分析得以进行，因为研究人员将能够更快地磨练结果，排除变量和协变量，从而提高检测遗传关联的统计能力。公共卫生相关性：虽然基因组数据的数量（例如，GWAS、测序等）近年来，尽管表型的可用性急剧增加，但为了促进研究之间的表型协调，所做的工作相对较少。我们提出的工具将使研究人员能够快速识别感兴趣的数据集，将研究问题从最基本的主效应扩展到更复杂的分析，如基因与环境的相互作用和包含多种表型的多变量测试，并通过磨练结果，排除变量和协变量，轻松进行更大的荟萃分析，增加统计能力以检测遗传关联。