Efficient statistical parsing and decoding for expressive grammar formalisms based on tree automata
基于树自动机的表达语法形式的高效统计解析和解码
基本信息
- 批准号:252303250
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2014
- 资助国家:德国
- 起止时间:2013-12-31 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The aim of this project is to develop efficient algorithms for expressive grammar formalisms. Such grammar formalisms describe string languages that are not context-free; languages of more complex objects, such as trees or graphs; and relations between such objects. They can thus handle linguistic representations, and capture linguistic generalizations, that probabilistic context-free grammars (PCFGs) cannot. This is useful for many emerging NLP tasks, such as semantic parsing of strings into graph-based semantic representations.The key idea of the project is to encode a wide variety of expressive grammar formalisms as Interpreted Regular Tree Grammars (IRTGs), and to specify algorithms for IRTGs in general; they will then apply directly to all the more specific formalisms. In the first phase, we have made significant progress in widening the range of formalisms which can be captured by IRTGs, including grammars for graph languages and for languages of sets. We also improved the performance of IRTG parsing algorithms drastically: parsing for PCFGs encoded as IRTGs is now 1000x faster than before (and roughly on par with dedicated PCFGparsers), and our parser for graph grammars is over 1000x faster than the previously best dedicated graph parser. On a theoretical level, we have clarified the formal relationships between expressive grammar formalisms; and on a practical level, researchers working with such grammar formalisms can directly utilize our generic algorithms and their open-source implementation, Alto.In the second phase, we want to scale Alto to datasets of realistic size and complexity on NLP tasks such as parsing, translation, and generation. Even with the theoretical and foundational advances of the first phase, a number of challenges became visible as we applied Alto to increasingly complex domains. These challenges are common to all grammar-based approaches, and include the induction of grammars from corpora in which grammatical information is only incompletely observable, as well as scaling the speed of our parsing and translation algorithms to real-world data. We will tackle these challenges generally, by developing new algorithms or adapting existing ones to IRTGs. We will complement this grammar-based perspective with neural methods for parsing, which we will combine with the specific perspective on language offered by IRTGs.The overall outcome of the project will be an end-to-end toolchain in which a user only needs to specify an expressive grammar formalism in terms of IRTGs and provide some data, and can then directly use our algorithms and implementations to induce and train a statistical grammar and use it for efficient parsing and translation.
这个项目的目的是为表达性语法形式开发有效的算法。这种语法形式描述的字符串语言不是上下文无关的;描述更复杂对象的语言,如树或图;以及这些对象之间的关系。因此,它们可以处理语言表示,并捕获语言概括,这是概率上下文无关语法(pcfg)无法做到的。这对于许多新兴的NLP任务非常有用,例如将字符串解析为基于图的语义表示。该项目的关键思想是将各种表达性语法形式编码为解释规则树语法(IRTGs),并为IRTGs指定一般算法;然后,它们将直接应用于所有更具体的形式。在第一阶段,我们在扩大irtg可以捕获的形式化范围方面取得了重大进展,包括图语言和集合语言的语法。我们还极大地改进了IRTG解析算法的性能:解析编码为IRTG的pcfg现在比以前快1000倍(与专用pcfg解析器大致相当),我们的图形语法解析器比以前最好的专用图形解析器快1000倍以上。在理论层面上,我们澄清了表达性语法形式主义之间的形式关系;在实践层面上,研究这些语法形式的研究人员可以直接利用我们的通用算法和它们的开源实现,Alto。在第二阶段,我们希望将Alto扩展到NLP任务(如解析、翻译和生成)的实际大小和复杂性的数据集。即使在第一阶段取得了理论和基础上的进步,当我们将Alto应用于越来越复杂的领域时,许多挑战也变得显而易见。这些挑战对所有基于语法的方法都是共同的,包括从语料库中归纳语法,其中语法信息只是不完全可观察到的,以及将我们的解析和翻译算法的速度扩展到现实世界的数据。我们将通过开发新算法或使现有算法适应irtg来应对这些挑战。我们将用神经分析方法来补充这种基于语法的视角,并将其与irtg提供的特定语言视角相结合。项目的总体结果将是一个端到端的工具链,其中用户只需要根据irtg指定表达性语法形式并提供一些数据,然后可以直接使用我们的算法和实现来诱导和训练统计语法,并将其用于有效的解析和翻译。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Alexander Koller其他文献
Professor Dr. Alexander Koller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Alexander Koller', 18)}}的其他基金
Effiziente Algorithmen für die Mikroplanung und Realisierung in der Generierung natürlicher Sprache
自然语言生成中微观规划和实现的高效算法
- 批准号:
27583293 - 财政年份:2006
- 资助金额:
-- - 项目类别:
Research Fellowships
The instructions of Paul V to the pontificial diplomats (1605-1621)
保罗五世对宗座外交官的指示(1605-1621)
- 批准号:
5378185 - 财政年份:2002
- 资助金额:
-- - 项目类别:
Publication Grants
相似国自然基金
基于随机网络演算的无线机会调度算法研究
- 批准号:60702009
- 批准年份:2007
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
- 批准号:
23KJ0565 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for JSPS Fellows
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10043983 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10609948 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10379072 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10155553 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Studies on robust statistical parsing across different domains using word embeddings
使用词嵌入跨不同领域的鲁棒统计解析研究
- 批准号:
16H06981 - 财政年份:2016
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Research Activity Start-up
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
RI: Small: Statistical Machine Translation Through a Tree Adjoining Grammar with Flexible Parsing Operations
RI:Small:通过具有灵活解析操作的树邻接语法进行统计机器翻译
- 批准号:
1161814 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Standard Grant
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral














{{item.name}}会员




