Measuring functional similarity between transcriptional enhancers using deep learning

使用深度学习测量转录增强子之间的功能相似性

基本信息

  • 批准号:
    10302539
  • 负责人:
  • 金额:
    $ 36.79万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-01 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

PROJECT SUMMARY Understanding transcriptional regulation remains as a major task in the molecular biology field. Enhancers are genetic elements that regulate when and where genes are expressed and their expression levels. These elements are hard to discover because their locations and orientations are not constrained with respect to their target genes. Several diseases and susceptibility to certain diseases are linked to mutations and variants in enhancers. Multiple experimental and computational methods have been developed for locating enhancers. Computational methods are more suitable to handle the large number of genomes being sequenced now because they are faster, cheaper, and less labor intensive than experimental methods. Despite many available computational tools, we lack a sophisticated tool that can measure similarity in the enhancer activity of a pair of sequences. We propose here utilizing Deep Artificial Neural Networks (DANNs) to develop such a tool. The long-term objective of this project is to decipher the code governing gene regulation with the following specific aims: (i) design a computational tool for measuring enhancer-enhancer similarity, (ii) validate up to 96 putative enhancers experimentally, (iii) understand enhancer grammar, and (iv) annotate enhancers in more than 50 insect genomes. To achieve these aims, a novel application of DANNs is proposed. Current tools utilize DANNs to answer a yes-no question: does a sequence have similar activity to the tissue-specific enhancers comprising a particular training set of known enhancers? These approaches require training a separate network on each tissue, leading to inconsistent performances on different tissues. Instead, here we use a DANN to answer a related but different question: does this sequence have similar enhancer activity to a single known tissue-specific enhancer? This deep network should perform consistently on different cell types because it is trained on pairs of sequences — not individual sequences as is the case in the available tools — representing all tissues for which there are known enhancers. The DANN is trained to recognize sequence pairs with similar enhancer activities and those with dissimilar activities including (i) two enhancers active in two different tissues, (ii) one enhancer and a random genomic sequence, and (iii) two random genomic sequences. The tool outputs a score between 0 and 1, indicating how similar the enhancer activities of the two sequences are. Using a much simpler machine learning algorithm than DANNs, we demonstrate that pairs with similar enhancer activities can be separated from pairs of random genomic sequences or pairs of one enhancer and a random genomic sequence with a very high accuracy. The new tool has many important potential applications including consistent annotation of enhancers across cell types and related species. Our tool can annotate enhancers active in a cell type that has a small number of known enhancers, and it can annotate enhancers in related genomes when there is a set of known enhancers demarcated in one of them. Discovering new transcription factor binding sites is another potential application. Studying enhancer “design principles” and the effects of variants can be facilitated using the proposed tool. Such applications will advance our field.
项目摘要 理解转录调控仍然是分子生物学领域的主要任务。增强子是 调节基因何时、何地表达及其表达水平的遗传元件。这些元素 很难发现,因为它们的位置和方向不受其靶基因的限制。 几种疾病和对某些疾病的易感性与增强子的突变和变异有关。多 已经开发了用于定位增强子的实验和计算方法。计算方法 更适合处理现在正在测序的大量基因组,因为它们更快、更便宜, 并且比实验方法劳动强度更低。尽管有许多可用的计算工具,我们缺乏一个 这是一个复杂的工具,可以测量一对序列的增强子活性的相似性。我们在此提议 利用深度阿尔蒂神经网络(DANNs)来开发这样一个工具。该项目的长期目标是 破译基因调控的密码,具体目标如下:(i)设计一个计算工具, 测量增强子-增强子相似性,(ii)实验验证多达96个推定的增强子,(iii)理解 增强子语法,和(iv)注释50多个昆虫基因组中的增强子。为了实现这些目标,一部小说 提出了DANN的应用。当前的工具利用DANN来回答一个是非问题: 具有与组织特异性增强子相似的活性,包括已知增强子的特定训练集? 这些方法需要在每个组织上训练单独的网络,从而导致在每个组织上的性能不一致。 不同的组织相反,这里我们使用DANN来回答一个相关但不同的问题: 是否具有与单个已知组织特异性增强子相似的增强子活性?这个深层网络应该能够 因为它是在成对的序列上训练的,而不是像在单个序列上训练的。 可用工具中的情况-代表存在已知增强子的所有组织。丹受过训练 识别具有相似增强子活性的序列对和具有不同活性的序列对,包括(i)两个 在两种不同组织中有活性的增强子,(ii)一种增强子和随机基因组序列,和(iii)两种随机基因组序列, 基因组序列该工具输出介于0和1之间的分数,表示增强子活动的相似程度 这两个序列是。使用比DANN简单得多的机器学习算法,我们证明了 具有相似增强子活性的对可以与随机基因组序列对或随机基因组序列对分离。 一个增强子和一个随机的基因组序列。新工具有许多重要的 潜在的应用包括跨细胞类型和相关物种的增强子的一致注释。我们的工具 可以注释在具有少量已知增强子的细胞类型中有活性的增强子, 当其中一个基因组中有一组已知的增强子时,相关基因组中的增强子。发现 新的转录因子结合位点是另一个潜在的应用。研究增强剂“设计原理”, 使用所提出的工具可以促进变体的影响。这些应用将推动我们的领域。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluation of metric and representation learning approaches: Effects of representations driven by relative distance on the performance.
度量和表示学习方法的评估:相对距离驱动的表示对性能的影响。
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Hani Z. Girgis其他文献

Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models
Look4TRs:一种使用自监督隐马尔可夫模型检测简单串联重复的从头工具
  • DOI:
    10.1101/449801
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alfredo Velasco;Benjamin T. James;Vincent D Wells;Hani Z. Girgis
  • 通讯作者:
    Hani Z. Girgis
Characterizing the epigenetic signatures of the human regulatory elements: A pilot study
表征人类调控元件的表观遗传特征:一项试点研究
  • DOI:
    10.1101/059394
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    S. L. Clement;Hani Z. Girgis
  • 通讯作者:
    Hani Z. Girgis

Hani Z. Girgis的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

TRUST2 - Improving TRUST in artificial intelligence and machine learning for critical building management
TRUST2 - 提高关键建筑管理的人工智能和机器学习的信任度
  • 批准号:
    10093095
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Collaborative R&D
QUANTUM-TOX - Revolutionizing Computational Toxicology with Electronic Structure Descriptors and Artificial Intelligence
QUANTUM-TOX - 利用电子结构描述符和人工智能彻底改变计算毒理学
  • 批准号:
    10106704
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    EU-Funded
Artificial intelligence in education: Democratising policy
教育中的人工智能:政策民主化
  • 批准号:
    DP240100602
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Discovery Projects
Application of artificial intelligence to predict biologic systemic therapy clinical response, effectiveness and adverse events in psoriasis
应用人工智能预测生物系统治疗银屑病的临床反应、有效性和不良事件
  • 批准号:
    MR/Y009657/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Fellowship
REU Site: CyberAI: Cybersecurity Solutions Leveraging Artificial Intelligence for Smart Systems
REU 网站:Cyber​​AI:利用人工智能实现智能系统的网络安全解决方案
  • 批准号:
    2349104
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
EAGER: Artificial Intelligence to Understand Engineering Cultural Norms
EAGER:人工智能理解工程文化规范
  • 批准号:
    2342384
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Reversible Computing and Reservoir Computing with Magnetic Skyrmions for Energy-Efficient Boolean Logic and Artificial Intelligence Hardware
用于节能布尔逻辑和人工智能硬件的磁斯格明子可逆计算和储层计算
  • 批准号:
    2343607
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
I-Corps: Translation Potential of a Secure Data Platform Empowering Artificial Intelligence Assisted Digital Pathology
I-Corps:安全数据平台的翻译潜力,赋能人工智能辅助数字病理学
  • 批准号:
    2409130
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Planning: Artificial Intelligence Assisted High-Performance Parallel Computing for Power System Optimization
规划:人工智能辅助高性能并行计算电力系统优化
  • 批准号:
    2414141
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Reassessing the Appropriateness of currently-available Data-set Protection Levers in the era of Artificial Intelligence
重新评估人工智能时代现有数据集保护手段的适用性
  • 批准号:
    23K22068
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了