A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
基本信息
- 批准号:10625529
- 负责人:
- 金额:$ 80.94万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-01 至 2027-03-31
- 项目状态:未结题
- 来源:
- 关键词:ATAC-seqAlgorithmsAtlasesAutomobile DrivingBase PairingBenchmarkingBindingCRISPR/Cas technologyCatalogsCellsChIP-seqChromatinCodeCollaborationsCollectionCommunitiesCommunity OutreachComputer ModelsDNADNA SequenceDataData AnalysesData SetDevelopmentDiseaseEducation and OutreachEducational workshopElementsEpigenetic ProcessExonsFunctional disorderFutureGenesGenomicsHistonesHumanHuman BioMolecular Atlas ProgramHuman GenomeHuman Genome ProjectHuman bodyIndividualInternationalInterruptionMapsMediatingMethodsModelingNematodaOnline SystemsOrganismPatternPhysiologyProcessQuality ControlRegistriesRegulatory ElementResearchResearch PersonnelResolutionResourcesRoleSchemeSignal TransductionSpecific qualifier valueTechniquesTechnologyTestingTimeTissuesTrainingTrans-Omics for Precision MedicineTranscriptional RegulationUntranslated RNAVariantVisualizationWorkbasecell typecommunity buildingcommunity settingdata analysis pipelinedata repositorydeep learningdeep learning modeldeep sequencingdesignepigenomeepigenomicsexperimental studyfollow-upgenome wide association studyin silicoin vivo Modelmachine learning modelmodel developmentnovelonline resourceoutreachpredictive modelingpublic repositoryrepositorysequence learningsyntaxtooltraittranscription factor
项目摘要
Project Summary/Abstract
The Human Genome Project (HGP) completed the first draft human genome sequence two decades ago. The
HGP revealed that human complexity arises from only approximately 20,000 coding genes, roughly the same
number as much simpler organisms such as nematodes. Intricate patterns of transcriptional regulation mediated
by non-coding regulatory elements specify the myriad cell types and states required for human complexity.
Genome-wide association studies have subsequently identified thousands of disease-associated variants, many
of which interrupt the function of these non-coding elements to disrupt transcriptional regulation. Thus, in order
to better understand human physiology and pathophysiology, comprehensive atlases of regulatory elements are
essential. Many previous efforts, including the International Human Epigenome Consortium (IHEC), the
FANTOM Consortium, the Roadmap Epigenomics Project, and the ENCODE Project, have aimed to build
comprehensive collections of regulatory elements, as well as computational models to better predict regulatory
activity and understand the sequence features underlying regulatory function. ENCODE (2003-2022) is a large-
scale consortium effort which aims to annotate every functional non-coding element of the human genome;
during our work on the project, we built a Registry of approximately 1 million human candidate cis-regulatory
elements (cCREs). We further developed deep-learning approaches which model the transcription factor motif
syntax that underlies element function at base-pair resolution and built two web-based resources, SCREEN and
Factorbook, to make our results accessible to the scientific community. Here, we propose to extend this
framework to build the Community Resource for Transcriptional Regulation (CRTR), a comprehensive atlas of
non-coding regulatory elements and machine-learning models which will encompass community and consortium
deep-sequencing data, both bulk and single cell, across a broad array of cell types and states. Our project has
five aims. First, we aim to curate community and consortium data for inclusion in CRTR and perform uniform
processing and quality control. Second, we aim to train deep-learning sequence models on bulk epigenetic
datasets to identify transcription factor motif syntax driving regulatory element activity in distinct tissues and cell
types. Third, we aim to train sequence models on single cell datasets to identify transcription factor motif syntax
driving transcriptional regulation in high-resolution cell states and during cell state transitions. Fourth, we aim to
use the aforementioned results to build comprehensive benchmark datasets and machine-learning model
collections, which will aid future analysts in designing new models to predict regulatory readouts. Fifth, we aim
to build a state-of-the-art web-based user interface to enable users to perform integrative analyses and in silico
experimentation with CRTR, and hold workshops and other outreach to maximize the impact of the resource and
its accessibility to the broader scientific community.
项目总结/摘要
人类基因组计划(HGP)在20年前完成了第一个人类基因组序列草图。的
人类基因组计划揭示,人类的复杂性仅来自大约20,000个编码基因,
像线虫这样简单的生物。复杂的转录调控模式介导
通过非编码调节元件指定了人类复杂性所需的无数细胞类型和状态。
全基因组关联研究随后确定了数千种疾病相关变异,其中许多
这些非编码元件的功能被中断,从而破坏转录调节。因而为了
为了更好地了解人体生理学和病理生理学,
具有本质意义许多以前的努力,包括国际人类表观基因组联盟(IHEC),
FANTOM联盟,路线图表观基因组学项目和ENCODE项目,旨在建立
全面的监管元素集合,以及计算模型,以更好地预测监管
活性和了解潜在的调控功能的序列特征。ENCODE(2003-2022)是一个大型的-
规模化联盟的努力,旨在注释人类基因组的每个功能性非编码元件;
在我们的项目工作中,我们建立了一个大约100万人候选顺式调控基因的注册表,
元素(cCREs)。我们进一步开发了对转录因子基序建模的深度学习方法
基于碱基对解析的元素函数的语法,并构建了两个基于Web的资源,SCREEN和
Factorbook,使我们的结果可供科学界使用。在这里,我们建议将其扩展到
框架,以建立社区资源转录调控(CRTR),一个全面的地图集,
非编码监管元素和机器学习模型,将包括社区和联盟
深度测序数据,包括批量和单细胞,跨越广泛的细胞类型和状态。我们的项目有
五个目标。首先,我们的目标是整理社区和联盟数据,以纳入CRTR,并执行统一的
加工和质量控制。其次,我们的目标是在批量表观遗传上训练深度学习序列模型
用于鉴定在不同组织和细胞中驱动调节元件活性的转录因子基序语法的数据集
类型第三,我们的目标是在单细胞数据集上训练序列模型,以识别转录因子基序语法
在高分辨率细胞状态和细胞状态转换期间驱动转录调节。第四,我们的目标是
使用上述结果构建全面的基准数据集和机器学习模型
这将有助于未来的分析师设计新的模型来预测监管读数。第五,目标
建立一个最先进的基于网络的用户界面,使用户能够执行综合分析和计算机模拟
试验CRTR,并举办研讨会和其他外联活动,以最大限度地发挥资源的影响,
更广泛的科学界。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Anshul Kundaje其他文献
Anshul Kundaje的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Anshul Kundaje', 18)}}的其他基金
Multi-Omics DACC: The Data Analysis and Coordination Center for the collaborative multi-omics for health and disease initiative
多组学 DACC:健康和疾病协作多组学计划的数据分析和协调中心
- 批准号:
10744561 - 财政年份:2023
- 资助金额:
$ 80.94万 - 项目类别:
A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
- 批准号:
10411262 - 财政年份:2022
- 资助金额:
$ 80.94万 - 项目类别:
A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
- 批准号:
10842047 - 财政年份:2022
- 资助金额:
$ 80.94万 - 项目类别:
Identifying causal genetic variants and molecular mechanisms impacting mental health
识别影响心理健康的因果遗传变异和分子机制
- 批准号:
10571911 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Identifying causal genetic variants and molecular mechanisms impacting mental health
识别影响心理健康的因果遗传变异和分子机制
- 批准号:
10380573 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10659170 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10297562 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10474459 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Multi-omic functional assessment of novel AD variants using high-throughput and single-cell technologies
使用高通量和单细胞技术对新型 AD 变体进行多组学功能评估
- 批准号:
10684210 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Multi-omic functional assessment of novel AD variants using high-throughput and single-cell technologies
使用高通量和单细胞技术对新型 AD 变体进行多组学功能评估
- 批准号:
10436207 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 80.94万 - 项目类别:
Research Grant