权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A multi-level bias correction model for bulk and single-cell CUT&Tag data

用于批量和单细胞切割的多级偏差校正模型

基本信息

批准号：
10645980
负责人：
Chongzhi Zang
金额：
$ 44.41万
依托单位：
UNIVERSITY OF VIRGINIA
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-09-01 至 2025-08-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10645980
关键词：
ATAC-seq Affect Basic Science Benchmarking Binding Binding Sites Bioinformatics Biological Biological Assay Cells ChIP-seq Characteristics Chromatin Chromatin Structure Complement Computer Models Computer software Computing Methodologies DNA Insertion Elements DNA-Protein Interaction Data Data Analyses Data Set Dependence Detection Disease Gene Expression Gene Expression Regulation Genetic Transcription Genome Goals Human Cell Line Human Genome Hyperactivity Individual Knowledge Measures Methods Modeling Names Pathogenesis Promoter Regions Research Resources Signal Transduction Statistical Models Techniques Tn5 transposase Transcriptional Regulation Work bioinformatics tool cancer cell computer framework cost data resource epigenomic profiling epigenomics experimental study functional genomics genome-wide genome-wide analysis histone modification human data improved innovation insight method development open source programs transcription factor transcriptome sequencing translational study

项目摘要

Histone modifications (HM) and transcription factors (TF) are key factors in maintaining the cell identity by regulating the type-specific gene expression program and chromatin structure. Both HMs and TFs can be aberrantly regulated in the pathogenesis and are a major class of cancer cell dependencies. Precise detection of HM and TF binding genome-wide is essential for a better understanding of transcriptional regulation. Cleavage Under Targets & Tagmentation (CUT&Tag) is an easy and low-cost epigenomic profiling method that can be performed on a low number of cells or even on the single-cell level. Thousands of CUT&Tag datasets have been generated for profiling TF binding sites and HMs since the advent of this technique, providing a valuable resource for functional genomics and disease research. CUT&Tag experiments rely on the hyperactive transposase Tn5 for tagmentation. Tn5 is subject to intrinsic sequence insertion biases, and enrichment of Tn5 captured reads toward chromatin accessibility regions also confound the distribution of CUT&Tag reads, especially for factors with weak association with chromatin accessibility. Both features bring great biases in the CUT&Tag data that confound the data analysis. For example, Strong CUT&Tag signal enrichment of repressive histone modification H3K27me3 can be observed at actively transcribed gene promoter regions where chromatin is openly accessible but no H3K27me3 signal from ChIP-seq, indicating that the observed CUT&Tag signal is likely false positive. The high-sparsity characteristics of single-cell data makes the intrinsic biases more substantial compared to bulk data, creating additional challenges in computational modeling and data analysis. For example, the average Tn5 intrinsic cleavage bias level varies across individual cells and confound the cell clustering result from single-cell ATAC-seq data, which carries similar Tn5 intrinsic bias as CUT&Tag. Based on these preliminary observations and our group’s existing work, we propose to develop computational models to accurately quantify both the open chromatin bias and the Tn5 intrinsic cleavage bias from CUT&Tag data on both bulk and single-cell levels. Using the new model to be developed, we will characterize how open chromatin and intrinsic cleavage biases affect the detection of HM and TF binding sites in both bulk and single-cell level CUT&Tag data. The bias correction model can be further incorporated in existing or new bioinformatics methods to detect the HM/TF signals, for both bulk and single- cell CUT&Tag data. this project focuses on developing a computational method for bias correction for improving CUT&Tag data analysis. The proposed computational method complements existing bioinformatics tools and will have broad applications in functional genomics and epigenomics research. The results from the proposed work will fill the knowledge gap in single-cell studies of chromatin dynamics and transcriptional regulation and could provide mechanistic insights for both basic science and translational studies.

组蛋白修饰（HM）和转录因子（TF）是维持细胞同一性的关键因素，调节类型特异性基因表达程序和染色质结构。HM和TF都可以是在发病机制中受到异常调节，并且是癌细胞依赖性的主要类别。精确检测 HM和TF结合基因组范围内是必要的，更好地理解转录调控。靶下切割和标签化（CUT&Tag）是一种简单且低成本的表观基因组分析方法，可以在少量细胞上或甚至在单个细胞水平上执行。数千个CUT&Tag数据集自从这项技术出现以来，已经产生了用于分析TF结合位点和HM的方法，功能基因组学和疾病研究的宝贵资源。CUT&Tag实验依赖于超活性转座酶Tn 5用于标签片段化。Tn 5受到内在序列插入偏差的影响，并且 Tn 5捕获的读数向染色质可及性区域的富集也混淆了 CUT&Tag读取，特别是对于与染色质可及性弱相关的因子。这两个特点带来 CUT&Tag数据中的大偏差混淆了数据分析。例如，强CUT&Tag信号抑制性组蛋白修饰H3 K27 me 3的富集可以在活跃转录的基因中观察到启动子区域中染色质可以开放访问，但没有来自ChIP-seq的H3 K27 me 3信号，这表明观察到的CUT&Tag信号可能是假阳性。单像元数据的高稀疏性特征与批量数据相比，这使得内在偏差更大，计算建模和数据分析。例如，平均Tn 5固有裂解偏置水平变化在单个细胞之间进行，并混淆了来自单细胞ATAC-seq数据的细胞聚类结果，与CUT&Tag类似的Tn 5固有偏置。根据这些初步观察和我们小组现有的工作，我们建议开发计算模型，以准确量化开放染色质偏倚和Tn 5 来自CUT&Tag数据的本体和单细胞水平上的内在切割偏差。使用新模型，开发，我们将描述如何开放染色质和内在的切割偏见影响HM的检测和TF结合位点在批量和单细胞水平的CUT&Tag数据。偏差校正模型可以进一步结合现有的或新的生物信息学方法，以检测HM/TF信号，无论是散装和单一的，单元格CUT&Tag数据。该项目的重点是开发一种用于偏差校正的计算方法，改进CUT&Tag数据分析。所提出的计算方法补充了现有的生物信息学工具，并将在功能基因组学和表观基因组学研究中有广泛的应用。的结果拟议的工作将填补染色质动力学和转录单细胞研究的知识空白监管，并可以提供基础科学和转化研究的机制见解。