Measuring functional similarity between transcriptional enhancers using deep learning

使用深度学习测量转录增强子之间的功能相似性

基本信息

  • 批准号:
    10302539
  • 负责人:
  • 金额:
    $ 36.79万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-01 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

PROJECT SUMMARY Understanding transcriptional regulation remains as a major task in the molecular biology field. Enhancers are genetic elements that regulate when and where genes are expressed and their expression levels. These elements are hard to discover because their locations and orientations are not constrained with respect to their target genes. Several diseases and susceptibility to certain diseases are linked to mutations and variants in enhancers. Multiple experimental and computational methods have been developed for locating enhancers. Computational methods are more suitable to handle the large number of genomes being sequenced now because they are faster, cheaper, and less labor intensive than experimental methods. Despite many available computational tools, we lack a sophisticated tool that can measure similarity in the enhancer activity of a pair of sequences. We propose here utilizing Deep Artificial Neural Networks (DANNs) to develop such a tool. The long-term objective of this project is to decipher the code governing gene regulation with the following specific aims: (i) design a computational tool for measuring enhancer-enhancer similarity, (ii) validate up to 96 putative enhancers experimentally, (iii) understand enhancer grammar, and (iv) annotate enhancers in more than 50 insect genomes. To achieve these aims, a novel application of DANNs is proposed. Current tools utilize DANNs to answer a yes-no question: does a sequence have similar activity to the tissue-specific enhancers comprising a particular training set of known enhancers? These approaches require training a separate network on each tissue, leading to inconsistent performances on different tissues. Instead, here we use a DANN to answer a related but different question: does this sequence have similar enhancer activity to a single known tissue-specific enhancer? This deep network should perform consistently on different cell types because it is trained on pairs of sequences — not individual sequences as is the case in the available tools — representing all tissues for which there are known enhancers. The DANN is trained to recognize sequence pairs with similar enhancer activities and those with dissimilar activities including (i) two enhancers active in two different tissues, (ii) one enhancer and a random genomic sequence, and (iii) two random genomic sequences. The tool outputs a score between 0 and 1, indicating how similar the enhancer activities of the two sequences are. Using a much simpler machine learning algorithm than DANNs, we demonstrate that pairs with similar enhancer activities can be separated from pairs of random genomic sequences or pairs of one enhancer and a random genomic sequence with a very high accuracy. The new tool has many important potential applications including consistent annotation of enhancers across cell types and related species. Our tool can annotate enhancers active in a cell type that has a small number of known enhancers, and it can annotate enhancers in related genomes when there is a set of known enhancers demarcated in one of them. Discovering new transcription factor binding sites is another potential application. Studying enhancer “design principles” and the effects of variants can be facilitated using the proposed tool. Such applications will advance our field.
项目概要 了解转录调控仍然是分子生物学领域的一项主要任务。增强剂是 调节基因表达的时间和地点及其表达水平的遗传元件。这些元素 很难发现,因为它们的位置和方向不受其目标基因的限制。 多种疾病和对某些疾病的易感性与增强子的突变和变异有关。多种的 已经开发了用于定位增强子的实验和计算方法。计算方法 更适合处理现在正在测序的大量基因组,因为它们更快、更便宜, 与实验方法相比,劳动强度较低。尽管有许多可用的计算工具,但我们缺乏 可以测量一对序列增强子活性相似性的复杂工具。我们在这里提议 利用深度人工神经网络(DANN)来开发这样的工具。该项目的长期目标是 破译控制基因调控的密码,其具体目标如下:(i)设计一种计算工具 测量增强子与增强子的相似性,(ii) 通过实验验证多达 96 个假定的增强子,(iii) 理解 (iv) 注释 50 多个昆虫基因组中的增强子。为了实现这些目标,一部小说 提出了 DANN 的应用。当前的工具利用 DANN 来回答是或否问题:是否存在序列 与包含已知增强子的特定训练集的组织特异性增强子具有类似的活性吗? 这些方法需要在每个组织上训练单独的网络,从而导致性能不一致 不同的组织。相反,这里我们使用 DANN 来回答一个相关但不同的问题:这个序列是否 与单一已知的组织特异性增强子具有相似的增强子活性吗?这个深度网络应该执行 在不同的细胞类型上保持一致,因为它是在序列对上训练的——而不是像 可用工具中的案例 - 代表有已知增强剂的所有组织。 DANN 已接受训练 识别具有相似增强子活性和具有不同活性的序列对,包括(i)两个 增强子在两种不同组织中活跃,(ii) 一种增强子和随机基因组序列,以及 (iii) 两种随机 基因组序列。该工具输出 0 到 1 之间的分数,表明增强子活动的相似程度 两个序列中的一个是。使用比 DANN 简单得多的机器学习算法,我们证明了 具有相似增强子活性的对可以从随机基因组序列对或 一个增强子和一个具有非常高准确性的随机基因组序列。新工具有许多重要的内容 潜在的应用包括跨细胞类型和相关物种的增强子的一致注释。我们的工具 可以注释在具有少量已知增强子的细胞类型中活跃的增强子,并且可以注释 相关基因组中的增强子,当其中一个基因组中有一组已知的增强子时。发现 新的转录因子结合位点是另一个潜在的应用。研究增强器“设计原理”和 使用所提出的工具可以促进变体的影响。此类应用将推动我们的领域发展。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluation of metric and representation learning approaches: Effects of representations driven by relative distance on the performance.
度量和表示学习方法的评估:相对距离驱动的表示对性能的影响。
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Hani Z. Girgis其他文献

Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models
Look4TRs:一种使用自监督隐马尔可夫模型检测简单串联重复的从头工具
  • DOI:
    10.1101/449801
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alfredo Velasco;Benjamin T. James;Vincent D Wells;Hani Z. Girgis
  • 通讯作者:
    Hani Z. Girgis
Characterizing the epigenetic signatures of the human regulatory elements: A pilot study
表征人类调控元件的表观遗传特征:一项试点研究
  • DOI:
    10.1101/059394
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    S. L. Clement;Hani Z. Girgis
  • 通讯作者:
    Hani Z. Girgis

Hani Z. Girgis的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

TRUST2 - Improving TRUST in artificial intelligence and machine learning for critical building management
TRUST2 - 提高关键建筑管理的人工智能和机器学习的信任度
  • 批准号:
    10093095
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Collaborative R&D
QUANTUM-TOX - Revolutionizing Computational Toxicology with Electronic Structure Descriptors and Artificial Intelligence
QUANTUM-TOX - 利用电子结构描述符和人工智能彻底改变计算毒理学
  • 批准号:
    10106704
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    EU-Funded
Artificial intelligence in education: Democratising policy
教育中的人工智能:政策民主化
  • 批准号:
    DP240100602
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Discovery Projects
Application of artificial intelligence to predict biologic systemic therapy clinical response, effectiveness and adverse events in psoriasis
应用人工智能预测生物系统治疗银屑病的临床反应、有效性和不良事件
  • 批准号:
    MR/Y009657/1
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Fellowship
REU Site: CyberAI: Cybersecurity Solutions Leveraging Artificial Intelligence for Smart Systems
REU 网站:Cyber​​AI:利用人工智能实现智能系统的网络安全解决方案
  • 批准号:
    2349104
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
EAGER: Artificial Intelligence to Understand Engineering Cultural Norms
EAGER:人工智能理解工程文化规范
  • 批准号:
    2342384
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Reversible Computing and Reservoir Computing with Magnetic Skyrmions for Energy-Efficient Boolean Logic and Artificial Intelligence Hardware
用于节能布尔逻辑和人工智能硬件的磁斯格明子可逆计算和储层计算
  • 批准号:
    2343607
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
I-Corps: Translation Potential of a Secure Data Platform Empowering Artificial Intelligence Assisted Digital Pathology
I-Corps:安全数据平台的翻译潜力,赋能人工智能辅助数字病理学
  • 批准号:
    2409130
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Planning: Artificial Intelligence Assisted High-Performance Parallel Computing for Power System Optimization
规划:人工智能辅助高性能并行计算电力系统优化
  • 批准号:
    2414141
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Standard Grant
Reassessing the Appropriateness of currently-available Data-set Protection Levers in the era of Artificial Intelligence
重新评估人工智能时代现有数据集保护手段的适用性
  • 批准号:
    23K22068
  • 财政年份:
    2024
  • 资助金额:
    $ 36.79万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了