权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Comprehensive Genomic Community Resource of Transcriptional Regulation

转录调控的综合基因组群落资源

基本信息

批准号：
10625529
负责人：
Anshul Kundaje
金额：
$ 80.94万
依托单位：
UNIV OF MASSACHUSETTS MED SCH WORCESTER
依托单位国家：
美国
项目类别：
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-01 至 2027-03-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10625529
关键词：
ATAC-seq Algorithms Atlases Automobile Driving Base Pairing Benchmarking Binding CRISPR/Cas technology Catalogs Cells ChIP-seq Chromatin Code Collaborations Collection Communities Community Outreach Computer Models DNA DNA Sequence Data Data Analyses Data Set Development Disease Education and Outreach Educational workshop Elements Epigenetic Process Exons Functional disorder Future Genes Genomics Histones Human Human BioMolecular Atlas Program Human Genome Human Genome Project Human body Individual International Interruption Maps Mediating Methods Modeling Nematoda Online Systems Organism Pattern Physiology Process Quality Control Registries Regulatory Element Research Research Personnel Resolution Resources Role Scheme Signal Transduction Specific qualifier value Techniques Technology Testing Time Tissues Training Trans-Omics for Precision Medicine Transcriptional Regulation Untranslated RNA Variant Visualization Work base cell type community building community setting data analysis pipeline data repository deep learning deep learning model deep sequencing design epigenome epigenomics experimental study follow-up genome wide association study in silico in vivo Model machine learning model model development novel online resource outreach predictive modeling public repository repository sequence learning syntax tool trait transcription factor

项目摘要

Project Summary/Abstract The Human Genome Project (HGP) completed the first draft human genome sequence two decades ago. The HGP revealed that human complexity arises from only approximately 20,000 coding genes, roughly the same number as much simpler organisms such as nematodes. Intricate patterns of transcriptional regulation mediated by non-coding regulatory elements specify the myriad cell types and states required for human complexity. Genome-wide association studies have subsequently identified thousands of disease-associated variants, many of which interrupt the function of these non-coding elements to disrupt transcriptional regulation. Thus, in order to better understand human physiology and pathophysiology, comprehensive atlases of regulatory elements are essential. Many previous efforts, including the International Human Epigenome Consortium (IHEC), the FANTOM Consortium, the Roadmap Epigenomics Project, and the ENCODE Project, have aimed to build comprehensive collections of regulatory elements, as well as computational models to better predict regulatory activity and understand the sequence features underlying regulatory function. ENCODE (2003-2022) is a large- scale consortium effort which aims to annotate every functional non-coding element of the human genome; during our work on the project, we built a Registry of approximately 1 million human candidate cis-regulatory elements (cCREs). We further developed deep-learning approaches which model the transcription factor motif syntax that underlies element function at base-pair resolution and built two web-based resources, SCREEN and Factorbook, to make our results accessible to the scientific community. Here, we propose to extend this framework to build the Community Resource for Transcriptional Regulation (CRTR), a comprehensive atlas of non-coding regulatory elements and machine-learning models which will encompass community and consortium deep-sequencing data, both bulk and single cell, across a broad array of cell types and states. Our project has five aims. First, we aim to curate community and consortium data for inclusion in CRTR and perform uniform processing and quality control. Second, we aim to train deep-learning sequence models on bulk epigenetic datasets to identify transcription factor motif syntax driving regulatory element activity in distinct tissues and cell types. Third, we aim to train sequence models on single cell datasets to identify transcription factor motif syntax driving transcriptional regulation in high-resolution cell states and during cell state transitions. Fourth, we aim to use the aforementioned results to build comprehensive benchmark datasets and machine-learning model collections, which will aid future analysts in designing new models to predict regulatory readouts. Fifth, we aim to build a state-of-the-art web-based user interface to enable users to perform integrative analyses and in silico experimentation with CRTR, and hold workshops and other outreach to maximize the impact of the resource and its accessibility to the broader scientific community.

项目总结/摘要人类基因组计划（HGP）在20年前完成了第一个人类基因组序列草图。的人类基因组计划揭示，人类的复杂性仅来自大约20，000个编码基因，像线虫这样简单的生物。复杂的转录调控模式介导通过非编码调节元件指定了人类复杂性所需的无数细胞类型和状态。全基因组关联研究随后确定了数千种疾病相关变异，其中许多这些非编码元件的功能被中断，从而破坏转录调节。因而为了为了更好地了解人体生理学和病理生理学，具有本质意义许多以前的努力，包括国际人类表观基因组联盟（IHEC）， FANTOM联盟，路线图表观基因组学项目和ENCODE项目，旨在建立全面的监管元素集合，以及计算模型，以更好地预测监管活性和了解潜在的调控功能的序列特征。ENCODE（2003-2022）是一个大型的- 规模联盟的努力，旨在注释人类基因组的每一个功能性非编码元件; 在我们的项目工作中，我们建立了一个大约100万人候选顺式调控基因的注册表，元素（cCREs）。我们进一步开发了对转录因子基序建模的深度学习方法基于碱基对解析的元素函数的语法，并构建了两个基于Web的资源，SCREEN和 Factorbook，使我们的结果可供科学界使用。在这里，我们建议将其扩展到框架，以建立社区资源转录调控（CRTR），一个全面的地图集，非编码监管元素和机器学习模型，将包括社区和联盟深度测序数据，包括批量和单细胞，跨越广泛的细胞类型和状态。我们的项目有五个目标。首先，我们的目标是整理社区和联盟数据，以纳入CRTR，并执行统一的加工和质量控制。其次，我们的目标是在批量表观遗传上训练深度学习序列模型用于鉴定在不同组织和细胞中驱动调节元件活性的转录因子基序语法的数据集类型第三，我们的目标是在单细胞数据集上训练序列模型，以识别转录因子基序语法在高分辨率细胞状态和细胞状态转换期间驱动转录调节。第四，我们的目标是使用上述结果构建全面的基准数据集和机器学习模型这将有助于未来的分析师设计新的模型来预测监管读数。第五，目标建立一个最先进的基于网络的用户界面，使用户能够执行综合分析和计算机模拟试验CRTR，并举办研讨会和其他外联活动，以最大限度地发挥资源的影响，更广泛的科学界。