Efficient statistical parsing and decoding for expressive grammar formalisms based on tree automata

基于树自动机的表达语法形式的高效统计解析和解码

基本信息

  • 批准号:
    252303250
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    德国
  • 项目类别:
    Research Grants
  • 财政年份:
    2014
  • 资助国家:
    德国
  • 起止时间:
    2013-12-31 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

The aim of this project is to develop efficient algorithms for expressive grammar formalisms. Such grammar formalisms describe string languages that are not context-free; languages of more complex objects, such as trees or graphs; and relations between such objects. They can thus handle linguistic representations, and capture linguistic generalizations, that probabilistic context-free grammars (PCFGs) cannot. This is useful for many emerging NLP tasks, such as semantic parsing of strings into graph-based semantic representations.The key idea of the project is to encode a wide variety of expressive grammar formalisms as Interpreted Regular Tree Grammars (IRTGs), and to specify algorithms for IRTGs in general; they will then apply directly to all the more specific formalisms. In the first phase, we have made significant progress in widening the range of formalisms which can be captured by IRTGs, including grammars for graph languages and for languages of sets. We also improved the performance of IRTG parsing algorithms drastically: parsing for PCFGs encoded as IRTGs is now 1000x faster than before (and roughly on par with dedicated PCFGparsers), and our parser for graph grammars is over 1000x faster than the previously best dedicated graph parser. On a theoretical level, we have clarified the formal relationships between expressive grammar formalisms; and on a practical level, researchers working with such grammar formalisms can directly utilize our generic algorithms and their open-source implementation, Alto.In the second phase, we want to scale Alto to datasets of realistic size and complexity on NLP tasks such as parsing, translation, and generation. Even with the theoretical and foundational advances of the first phase, a number of challenges became visible as we applied Alto to increasingly complex domains. These challenges are common to all grammar-based approaches, and include the induction of grammars from corpora in which grammatical information is only incompletely observable, as well as scaling the speed of our parsing and translation algorithms to real-world data. We will tackle these challenges generally, by developing new algorithms or adapting existing ones to IRTGs. We will complement this grammar-based perspective with neural methods for parsing, which we will combine with the specific perspective on language offered by IRTGs.The overall outcome of the project will be an end-to-end toolchain in which a user only needs to specify an expressive grammar formalism in terms of IRTGs and provide some data, and can then directly use our algorithms and implementations to induce and train a statistical grammar and use it for efficient parsing and translation.
该项目的目的是开发表达语法形式的有效算法。这种语法形式描述了不是上下文无关的字符串语言。更复杂对象的语言,例如树或图;以及这些对象之间的关系。因此,它们可以处理语言表示并捕获语言概括,而概率上下文无关语法(PCFG)则无法做到这一点。这对于许多新兴的 NLP 任务很有用,例如将字符串语义解析为基于图的语义表示。该项目的关键思想是将各种表达语法形式编码为解释正则树语法(IRTG),并为 IRTG 指定一般算法;然后它们将直接应用于所有更具体的形式主义。在第一阶段,我们在扩大 IRTG 可以捕获的形式主义范围方面取得了重大进展,包括图语言和集合语言的语法。我们还大幅提高了 IRTG 解析算法的性能:解析编码为 IRTG 的 PCFG 现在比以前快 1000 倍(与专用 PCFG 解析器大致相当),并且我们的图语法解析器比之前最好的专用图解析器快 1000 倍以上。在理论层面上,我们阐明了表达语法形式主义之间的形式关系;在实践层面上,研究这种语法形式的研究人员可以直接利用我们的通用算法及其开源实现 Alto。在第二阶段,我们希望将 Alto 扩展到 NLP 任务(例如解析、翻译和生成)上的实际大小和复杂性的数据集。尽管第一阶段在理论和基础上取得了进展,但当我们将 Alto 应用到日益复杂的领域时,许多挑战也变得显而易见。这些挑战对于所有基于语法的方法来说都是常见的,包括从语料库中归纳语法,其中语法信息只能不完全观察到,以及将我们的解析和翻译算法的速度扩展到现实世界的数据。我们将通过开发新算法或使现有算法适应 IRTG 来普遍应对这些挑战。我们将用神经分析方法来补充这种基于语法的视角,并将其与 IRTG 提供的特定语言视角相结合。该项目的总体成果将是一个端到端的工具链,用户只需根据 IRTG 指定表达语法形式并提供一些数据,然后可以直接使用我们的算法和实现来归纳和训练统计语法并将其用于 高效的解析和翻译。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Alexander Koller其他文献

Professor Dr. Alexander Koller的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Alexander Koller', 18)}}的其他基金

Effiziente Algorithmen für die Mikroplanung und Realisierung in der Generierung natürlicher Sprache
自然语言生成中微观规划和实现的高效算法
  • 批准号:
    27583293
  • 财政年份:
    2006
  • 资助金额:
    --
  • 项目类别:
    Research Fellowships
The instructions of Paul V to the pontificial diplomats (1605-1621)
保罗五世对宗座外交官的指示(1605-1621)
  • 批准号:
    5378185
  • 财政年份:
    2002
  • 资助金额:
    --
  • 项目类别:
    Publication Grants
The Interactive Cookbook
互动食谱
  • 批准号:
    461220770
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Research Grants

相似国自然基金

基于随机网络演算的无线机会调度算法研究
  • 批准号:
    60702009
  • 批准年份:
    2007
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
  • 批准号:
    23KJ0565
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
  • 批准号:
    10043983
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
  • 批准号:
    10609948
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
  • 批准号:
    10379072
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
  • 批准号:
    10155553
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Studies on robust statistical parsing across different domains using word embeddings
使用词嵌入跨不同领域的鲁棒统计解析研究
  • 批准号:
    16H06981
  • 财政年份:
    2016
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Research Activity Start-up
statistical modeling for context-aware parsing
上下文感知解析的统计建模
  • 批准号:
    392280-2010
  • 财政年份:
    2012
  • 资助金额:
    --
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
RI: Small: Statistical Machine Translation Through a Tree Adjoining Grammar with Flexible Parsing Operations
RI:Small:通过具有灵活解析操作的树邻接语法进行统计机器翻译
  • 批准号:
    1161814
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
statistical modeling for context-aware parsing
上下文感知解析的统计建模
  • 批准号:
    392280-2010
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
statistical modeling for context-aware parsing
上下文感知解析的统计建模
  • 批准号:
    392280-2010
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了