Graph Grammars for Molecular Structure Search and Classification

用于分子结构搜索和分类的图文法

基本信息

项目摘要

Numerous fields of study focus on small molecules. A prominent example is the field of drug design, where small molecules are used to inhibit or activate proteins to achieve a desired biological function. In these fields, we often want to scan databases for molecules containing certain substructures. Traditionally, these substructures are modelled in chemical description languages such as Daylight’s SMARTS. These languages tend to be very complex and are very restricted in their ability to describe the topological patterns of the underlying graphs. Parsing and matching patterns against a database of molecules is NP-complete. To circumvent these problems, we propose a simple graph grammar to describe substructures. Even very simple graph rewriting systems allow a high expressive power that almost reaches that of SMARTS. To use these graph grammars for molecular structure search, we have to solve the subgraph matching problem. Although this problem remains NP-complete, it becomes polynomial if each minimal cut of the query graph has bounded size, which we empirically find to be true for most molecules contained in the standard databases. We will investigate the complexity of the problem for more known graph parameters and try to relate the maximal size of a minimal cut to other parameters and we will focus on parameters that are typically small for molecular graphs and we will make our basic algorithm more efficient in practice. Furthermore, we want to derive over-approximations of the class of graphs generated by a grammar for which the subgraph matching problem can be solved more efficiently. As a second research direction, we will develop and implement efficient algorithms for learning graph grammars from positive and negative examples. We aim to find a graph grammar that is as simple as possible and matches the positive examples but does not match the negative examples for the chemical group. A trivial grammar that interpolates the positive and negative examples is a grammar that creates positive examples that clearly overfit the positive examples. The underlying idea behind this learning task is to automatically identify aspects of the pharmacophore of these molecules. The challenge here is to simultaneously prevent overfitting and overgeneralization. We plan to develop constructive algorithms, i.e. algorithms that compute a simple graph grammar that interpolates the positive and negative examples and improvement algorithms, i.e. algorithms that try to simplify a graph grammar while preserving its interpolating property.
许多研究领域都关注小分子。一个突出的例子是药物设计领域,其中小分子用于抑制或激活蛋白质以实现所需的生物功能。在这些领域中,我们经常想要扫描数据库,寻找包含某些子结构的分子。传统上,这些子结构在化学描述语言中建模,例如Daylight的SMARTS。这些语言往往非常复杂,并且在描述底层图的拓扑模式的能力方面非常有限。分析和匹配分子数据库的模式是NP完全的。为了避免这些问题,我们提出了一个简单的图形语法来描述子结构。即使是非常简单的图重写系统也允许几乎达到SMARTS的高表达能力。为了使用这些图文法进行分子结构搜索,我们必须解决子图匹配问题。虽然这个问题仍然是NP完全的,它成为多项式,如果每个最小的查询图有界的大小,我们凭经验发现是真实的标准数据库中包含的大多数分子。我们将研究问题的复杂性,为更多的已知图形参数,并试图将最小切割的最大尺寸与其他参数,我们将专注于参数通常是小分子图,我们将使我们的基本算法在实践中更有效。此外,我们希望得到过近似的类图所产生的语法,其中的子图匹配问题可以更有效地解决。作为第二个研究方向,我们将开发和实现有效的算法,从正面和负面的例子学习图语法。我们的目标是找到一个图形语法,是尽可能简单,并匹配的积极的例子,但不匹配的化学基团的负面例子。插入肯定和否定示例的平凡语法是创建明显过拟合肯定示例的肯定示例的语法。这个学习任务背后的基本思想是自动识别这些分子药效团的各个方面。这里的挑战是同时防止过度拟合和过度泛化。我们计划开发建设性的算法,即算法,计算一个简单的图形语法插入的正面和负面的例子和改进算法,即算法,试图简化图形语法,同时保持其插值属性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Ernst Althaus其他文献

Professor Dr. Ernst Althaus的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Ernst Althaus', 18)}}的其他基金

Einfache und schnelle Implementierung von exakten Optimierungsalgorithmen mit SCIL
使用 SCIL 简单快速地实现精确优化算法
  • 批准号:
    48021572
  • 财政年份:
    2007
  • 资助金额:
    --
  • 项目类别:
    Priority Programmes

相似海外基金

The Emergence and Refinement of Grammars: perspectives from syntax and phonology
语法的出现和完善:句法和音韵学的视角
  • 批准号:
    2890509
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Studentship
EAGER: Building Language Technologies by Machine Reading Grammars
EAGER:通过机器阅读语法构建语言技术
  • 批准号:
    2327143
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: How flexible are grammars past puberty? Evidence from heritage language returnees
博士论文研究:青春期过后语法的灵活性如何?
  • 批准号:
    2234698
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Algorithms and Inference of Grammars and Natural Computing Models
语法和自然计算模型的算法和推理
  • 批准号:
    RGPIN-2022-05092
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
MIM: Elucidating the Rules of Cooperation and Resiliency in Microbial Communities through Stochastic Graph Grammars
MIM:通过随机图语法阐明微生物群落的合作和弹性规则
  • 批准号:
    2125965
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
MIM: Elucidating the Rules of Cooperation and Resiliency in Microbial Communities through Stochastic Graph Grammars
MIM:通过随机图语法阐明微生物群落的合作和弹性规则
  • 批准号:
    2126387
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
CRII: III: Toward the Compression of Pangenomic DNA Sequence Data Using Context-Free Grammars
CRII:III:使用上下文无关语法压缩泛基因组 DNA 序列数据
  • 批准号:
    2105391
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Vulnerable native grammars: the effects of limited input in native language attrition
脆弱的母语语法:有限输入对母语磨损的影响
  • 批准号:
    AH/T005157/1
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Integrating prosodic structure into computational grammars
将韵律结构整合到计算语法中
  • 批准号:
    447093200
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    WBP Fellowship
Natural Language Acquisition for Machines - Reinforcement Learning of Minimalist Grammars
机器自然语言习得——极简语法的强化学习
  • 批准号:
    432615119
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Research Grants
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了