Efficient statistical parsing and decoding for expressive grammar formalisms based on tree automata
基于树自动机的表达语法形式的高效统计解析和解码
基本信息
- 批准号:252303250
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2014
- 资助国家:德国
- 起止时间:2013-12-31 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The aim of this project is to develop efficient algorithms for expressive grammar formalisms. Such grammar formalisms describe string languages that are not context-free; languages of more complex objects, such as trees or graphs; and relations between such objects. They can thus handle linguistic representations, and capture linguistic generalizations, that probabilistic context-free grammars (PCFGs) cannot. This is useful for many emerging NLP tasks, such as semantic parsing of strings into graph-based semantic representations.The key idea of the project is to encode a wide variety of expressive grammar formalisms as Interpreted Regular Tree Grammars (IRTGs), and to specify algorithms for IRTGs in general; they will then apply directly to all the more specific formalisms. In the first phase, we have made significant progress in widening the range of formalisms which can be captured by IRTGs, including grammars for graph languages and for languages of sets. We also improved the performance of IRTG parsing algorithms drastically: parsing for PCFGs encoded as IRTGs is now 1000x faster than before (and roughly on par with dedicated PCFGparsers), and our parser for graph grammars is over 1000x faster than the previously best dedicated graph parser. On a theoretical level, we have clarified the formal relationships between expressive grammar formalisms; and on a practical level, researchers working with such grammar formalisms can directly utilize our generic algorithms and their open-source implementation, Alto.In the second phase, we want to scale Alto to datasets of realistic size and complexity on NLP tasks such as parsing, translation, and generation. Even with the theoretical and foundational advances of the first phase, a number of challenges became visible as we applied Alto to increasingly complex domains. These challenges are common to all grammar-based approaches, and include the induction of grammars from corpora in which grammatical information is only incompletely observable, as well as scaling the speed of our parsing and translation algorithms to real-world data. We will tackle these challenges generally, by developing new algorithms or adapting existing ones to IRTGs. We will complement this grammar-based perspective with neural methods for parsing, which we will combine with the specific perspective on language offered by IRTGs.The overall outcome of the project will be an end-to-end toolchain in which a user only needs to specify an expressive grammar formalism in terms of IRTGs and provide some data, and can then directly use our algorithms and implementations to induce and train a statistical grammar and use it for efficient parsing and translation.
这个项目的目的是开发高效的算法表达语法形式。这种语法形式主义描述了非上下文无关的字符串语言;更复杂对象的语言,如树或图;以及这些对象之间的关系。因此,它们可以处理语言表示,并捕捉语言概括,概率上下文无关语法(PCFG)不能。这对许多新兴的NLP任务很有用,例如将字符串语义解析为基于图的语义表示。该项目的关键思想是将各种各样的表达性语法形式化编码为解释正则树语法(Interpreted Regular Tree Grammars,IRTGs),并指定一般IRTGs的算法;然后它们将直接应用于所有更具体的形式化。在第一阶段,我们已经取得了显着的进展,扩大范围的形式主义,可以捕获的IRTG,包括语法的图形语言和语言的集合。我们还大幅提高了IRTG解析算法的性能:对编码为IRTG的PCFG的解析现在比以前快1000倍(与专用PCFG解析器大致相当),我们的图形语法解析器比以前最好的专用图形解析器快1000倍以上。在理论层面上,我们已经澄清了表达语法形式之间的正式关系;在实践层面上,研究人员可以直接使用我们的通用算法及其开源实现Alto。在第二阶段,我们希望将Alto扩展到具有实际大小和复杂度的NLP任务数据集,例如解析,翻译和生成。即使在第一阶段取得了理论和基础上的进步,但随着我们将Alto应用于日益复杂的领域,一些挑战也变得显而易见。这些挑战是所有基于语法的方法所共有的,包括从语法信息仅不完全可观察的语料库中归纳语法,以及将我们的解析和翻译算法的速度扩展到真实世界的数据。我们将通过开发新算法或使现有算法适应IRTG来应对这些挑战。我们将用神经方法来补充这种基于语法的观点,我们将联合收割机与IRTG提供的语言的特定观点相结合。该项目的总体成果将是一个端到端的工具链,其中用户只需要指定一个表达性语法形式主义,并提供一些数据,然后可以直接使用我们的算法和实现来归纳和训练统计语法,并将其用于高效的解析和翻译。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Alexander Koller其他文献
Professor Dr. Alexander Koller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Alexander Koller', 18)}}的其他基金
Effiziente Algorithmen für die Mikroplanung und Realisierung in der Generierung natürlicher Sprache
自然语言生成中微观规划和实现的高效算法
- 批准号:
27583293 - 财政年份:2006
- 资助金额:
-- - 项目类别:
Research Fellowships
The instructions of Paul V to the pontificial diplomats (1605-1621)
保罗五世对宗座外交官的指示(1605-1621)
- 批准号:
5378185 - 财政年份:2002
- 资助金额:
-- - 项目类别:
Publication Grants
相似国自然基金
基于随机网络演算的无线机会调度算法研究
- 批准号:60702009
- 批准年份:2007
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
- 批准号:
23KJ0565 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for JSPS Fellows
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10043983 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10609948 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10379072 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Parsing Neurobiological Bases of Heterogeneity in ADHD
解析 ADHD 异质性的神经生物学基础
- 批准号:
10155553 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Studies on robust statistical parsing across different domains using word embeddings
使用词嵌入跨不同领域的鲁棒统计解析研究
- 批准号:
16H06981 - 财政年份:2016
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Research Activity Start-up
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
RI: Small: Statistical Machine Translation Through a Tree Adjoining Grammar with Flexible Parsing Operations
RI:Small:通过具有灵活解析操作的树邻接语法进行统计机器翻译
- 批准号:
1161814 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Standard Grant
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
statistical modeling for context-aware parsing
上下文感知解析的统计建模
- 批准号:
392280-2010 - 财政年份:2010
- 资助金额:
-- - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral














{{item.name}}会员




