权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistical Power Calculations for ChIP-seq experiments

ChIP-seq 实验的统计功效计算

基本信息

批准号：
8284083
负责人：
Sunduz Keles
金额：
$ 18.41万
依托单位：
UNIVERSITY OF WISCONSIN-MADISON
依托单位国家：
美国
项目类别：
财政年份：
2012
资助国家：
美国
起止时间：
2012-05-01 至 2014-03-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8284083
关键词：
Alleles Base Sequence Binding Binomial Model Bioconductor Biologic Characteristic Biological Cells ChIP-seq Communities Computer Analysis Computer software DNA Data Data Analyses Data Set Databases Detection Development Diagnosis Disease Family Gene Expression Genetic Genome Genomics Goals Guanine + Cytosine Composition Immune Sera Immunoglobulin G Individual Lead Letters Location Maps Methods Modeling Play Public Health Reading Research Research Personnel Resources Role Sampling Simulate Staging Statistical Models Technology Tissues Training United States National Institutes of Health Validation Variant base chromatin immunoprecipitation design epigenomics genome sequencing genome wide association study genome-wide human disease next generation novel programs research study simulation software development tool transcription factor

项目摘要

DESCRIPTION (provided by applicant): The advent of high throughput next generation sequencing (NGS) technologies have revolutionized the fields of genetics and genomics by allowing rapid and inexpensive sequencing of billions of bases. Among the NGS applications, ChIP-seq (chromatin immunoprecipitation followed by NGS) is perhaps the most successful to date. ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Both of these play crucial roles in programming of cell specific gene expression; therefore their genome-wide mapping can significantly advance our ability to understand and diagnose human diseases. Although basic analysis tools for ChIP-seq data are rapidly increasing, there has not been much progress on the design problems regarding ChIP-seq experiments. A challenging question that the researchers planning a ChIP-seq experiment need to answer is: how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determine whether or not all the underlying targets (e.g., binding locations or epigenomic profiles) can be identified with a targeted power. This is especially important when the goal is the analysis of individual-to-individual and allele specific variation o transcription factor binding and epigenomic profiles. Insufficient sequencing depths may lead to spurious differences in binding or epigenome profiles. In this proposal, we aim to develop a general framework for power calculations in ChIP-seq experiments with three specific aims and by considering statistical models commonly used in ChIP-seq analysis: (1) Power calculations based on the conditional Binomial model; (2) Power calculations based on the Poisson and Negative Binomial regression models; (3) A power calculation tool for GALAXY and Bioconductor. This project will be accomplished through a combination of theoretical/methodological development, simulation, computational analysis, and experimental validation. Methods will be developed and evaluated using datasets from the ENCODE, modENCODE, and the RoadMap Epigenomics consortiums as well as novel datasets from collaborators. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for the efficient design of ChIP-seq experiments. PUBLIC HEALTH RELEVANCE: The proposed research is relevant to public health because capturing genome-wide binding of transcription factors and epigenomic information by ChIP-seq technology is invaluable for comprehensively understanding development, differentiation, and disease. Design of ChIP-seq experiments present unprecedented challenges. We will develop a statistical framework for power calculations in designing ChIP-seq experiments and disseminate results and software to the research community.

描述(由申请人提供)：高通量下一代测序(NGS)技术的出现使数十亿个碱基的快速和廉价测序成为遗传学和基因组学领域的革命性变革。在NGS的应用中，CHIP-SEQ(染色质免疫沉淀后NGS)可能是迄今为止最成功的。ChIP-SEQ技术使研究人员能够研究转录因子的全基因组结合和表观基因组标记的映射。这两种基因在细胞特异性基因表达的编程中都起着至关重要的作用；因此，它们的全基因组图谱可以显著提高我们理解和诊断人类疾病的能力。尽管用于CHIP-SEQ数据的基本分析工具正在迅速增加，但是关于CHIP-SEQ实验的设计问题并没有太大的进展。计划进行芯片序列实验的研究人员需要回答的一个具有挑战性的问题是：芯片和对照样本的测序应该有多深？答案取决于多个因素，其中一些因素可以由实验者根据试点/初步数据设定。芯片序列实验的测序深度是决定所有潜在靶点(例如结合位置或表观基因组图谱)是否都能用靶向能力识别的关键因素之一。当目标是分析转录因子结合和表观基因组图谱的个体对个体和等位基因特异性变异时，这一点尤其重要。测序深度不足可能会导致结合或表观基因组图谱中的虚假差异。在这项建议中，我们旨在开发一个通用的框架，在CHIP-SEQ实验中有三个特定的目标，并通过考虑在CHIP-SEQ分析中常用的统计模型：(1)基于条件二项模型的功率计算；(2)基于泊松和负二项回归模型的功率计算；(3)银河和生物导体的功率计算工具。这个项目将通过理论/方法开发、模拟、计算分析和实验验证相结合的方式完成。将使用ENCODE、modENCODE和路线图表观基因组学联盟的数据集以及来自合作者的新数据集来开发和评估方法。该项目产生的统计资源将通过公开可用的软件传播，将为有效设计芯片序列实验提供必要的工具。公共卫生相关性：拟议的研究与公共健康相关，因为通过芯片序列技术捕获转录因子和表观基因组信息的全基因组结合对于全面了解发育、分化和疾病是非常宝贵的。芯片序列实验的设计提出了前所未有的挑战。我们将开发一个统计框架，用于在设计芯片序列实验时进行功率计算，并将结果和软件传播给研究界。