Efficient statistical parsing and decoding for expressive grammar formalisms based on tree automata
基于树自动机的表达语法形式的高效统计解析和解码
基本信息
- 批准号:252303250
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2014
- 资助国家:德国
- 起止时间:2013-12-31 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The aim of this project is to develop efficient algorithms for expressive grammar formalisms. Such grammar formalisms describe string languages that are not context-free; languages of more complex objects, such as trees or graphs; and relations between such objects. They can thus handle linguistic representations, and capture linguistic generalizations, that probabilistic context-free grammars (PCFGs) cannot. This is useful for many emerging NLP tasks, such as semantic parsing of strings into graph-based semantic representations.The key idea of the project is to encode a wide variety of expressive grammar formalisms as Interpreted Regular Tree Grammars (IRTGs), and to specify algorithms for IRTGs in general; they will then apply directly to all the more specific formalisms. In the first phase, we have made significant progress in widening the range of formalisms which can be captured by IRTGs, including grammars for graph languages and for languages of sets. We also improved the performance of IRTG parsing algorithms drastically: parsing for PCFGs encoded as IRTGs is now 1000x faster than before (and roughly on par with dedicated PCFGparsers), and our parser for graph grammars is over 1000x faster than the previously best dedicated graph parser. On a theoretical level, we have clarified the formal relationships between expressive grammar formalisms; and on a practical level, researchers working with such grammar formalisms can directly utilize our generic algorithms and their open-source implementation, Alto.In the second phase, we want to scale Alto to datasets of realistic size and complexity on NLP tasks such as parsing, translation, and generation. Even with the theoretical and foundational advances of the first phase, a number of challenges became visible as we applied Alto to increasingly complex domains. These challenges are common to all grammar-based approaches, and include the induction of grammars from corpora in which grammatical information is only incompletely observable, as well as scaling the speed of our parsing and translation algorithms to real-world data. We will tackle these challenges generally, by developing new algorithms or adapting existing ones to IRTGs. We will complement this grammar-based perspective with neural methods for parsing, which we will combine with the specific perspective on language offered by IRTGs.The overall outcome of the project will be an end-to-end toolchain in which a user only needs to specify an expressive grammar formalism in terms of IRTGs and provide some data, and can then directly use our algorithms and implementations to induce and train a statistical grammar and use it for efficient parsing and translation.
该项目的目的是为表达语法形式主义开发有效的算法。这种语法形式主义描述了不是不含上下文的弦语言。更复杂的对象的语言,例如树或图形;和此类对象之间的关系。因此,他们可以处理语言表示,并捕获语言概括,即概率无上下文的语法(PCFGS)不能。这对于许多新兴的NLP任务很有用,例如将字符串的语义解析为基于图的语义表示。然后,他们将直接应用于所有更具体的形式主义。在第一阶段,我们在扩大形式主义的范围方面取得了重大进展,这些形式主义范围可以由IRTG捕获,包括图形语言的语法和集合的语言。我们还急剧改善了IRTG解析算法的性能:编码为IRTG的PCFG的解析现在比以前快1000倍(并且与专用的PCFGPARSERS大致相当),而我们的Graph Grammars的分析器比以前最佳的专用图形图表更快。从理论上讲,我们已经阐明了表达语法形式主义之间的形式关系。在实用的层面上,使用此类语法形式主义的研究人员可以直接利用我们的通用算法及其开源实施,即第二阶段,我们希望将Alto扩展到诸如分析,翻译和生成等NLP任务上的现实规模和复杂性的数据集。即使是第一阶段的理论和基础进步,当我们将中音应用于日益复杂的领域时,许多挑战也变得明显。这些挑战对于所有基于语法的方法都是共同的,包括从语料库中诱导语法,其中语法信息仅是无法观察到的,以及将我们的解析和翻译算法的速度扩展到现实世界数据。我们将通常通过开发新算法或将现有算法调整为IRTG来解决这些挑战。我们将通过神经方法来补充这种基于语法的观点,我们将与IRTG提供的语言的具体视角结合使用。该项目的总体结果将是用户只需要指定表达语法形式的端到端工具链,然后就IRTG和我们的Algorth和Algorth Ormins和ALGORITH进行统计,并可以直接使用ALGORS,并将A Algorsion和ALGOR构成A Algormen,并且可以用Algorth andrime Algers andrime Insportion Algormen,并且有效的解析和翻译。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Alexander Koller其他文献
Professor Dr. Alexander Koller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Alexander Koller', 18)}}的其他基金
Effiziente Algorithmen für die Mikroplanung und Realisierung in der Generierung natürlicher Sprache
自然语言生成中微观规划和实现的高效算法
- 批准号:
27583293 - 财政年份:2006
- 资助金额:
-- - 项目类别:
Research Fellowships
The instructions of Paul V to the pontificial diplomats (1605-1621)
保罗五世对宗座外交官的指示(1605-1621)
- 批准号:
5378185 - 财政年份:2002
- 资助金额:
-- - 项目类别:
Publication Grants
相似国自然基金
联合稳定同位素与多元统计的长江上游典型城乡河道磷污染源解析及转化机制
- 批准号:52170104
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
基于混合模型与解析形式分布函数的海冰模式厚度分布方案及其应用
- 批准号:41575076
- 批准年份:2015
- 资助金额:68.0 万元
- 项目类别:面上项目
统计相关NMR和LC-MS数据对复杂混合物结构解析和定量分析的研究
- 批准号:21505142
- 批准年份:2015
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
大气颗粒物粒径分群统计和源解析的偏振光学方法研究
- 批准号:41475125
- 批准年份:2014
- 资助金额:100.0 万元
- 项目类别:面上项目
基于受体模型的区域土壤重金属源解析方法构建及评估
- 批准号:41401523
- 批准年份:2014
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Deepening and Expanding Research for Efficient Methods of Function Estimation in High Dimensional Statistical Analysis
高维统计分析中高效函数估计方法的深化和拓展研究
- 批准号:
23H03353 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Statistical approach for efficient and optimized evaluation of new treatment
用于有效和优化评估新疗法的统计方法
- 批准号:
23K09640 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
- 批准号:
23KJ0565 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for JSPS Fellows
Development of plastic theory based on statistical mechanics to realize effect of dislocation behavior
发展基于统计力学的塑性理论以实现位错行为的效果
- 批准号:
23K18458 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Statistical modeling via functional data analysis and its application to various fields
通过功能数据分析进行统计建模及其在各个领域的应用
- 批准号:
23K11005 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)