权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Hybrid grammars for discontinuous phrase structure trees

不连续短语结构树的混合语法

基本信息

批准号：
255344147
负责人：
Professor Dr.-Ing. Heiko Vogler
金额：
--
依托单位：
Professur für Grundlagen der Programmierung
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2014
资助国家：
德国
起止时间：
2013-12-31 至 2014-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/255344147?language=en
关键词：
Hybrid grammars discontinuous phrase structure

项目摘要

The syntactic structure of sentences of a natural language is usually given by a context-free grammar. This grammar can be used to parse a given sentence and to build its parse tree. Each parse tree is continuous in the sense that for each node the following holds: the frontier, i.e., the concatenation of the leaf labels from left to right, of the i-th subtree is placed to the left of the frontier of the j-th subtree provided i is smaller than j. For languages with relative free word ordering (e.g. German and Dutch) also discontinuous parse trees occur. For instance, in the sentence "Sie hat oft geschrieben" the two words "hat" and "geschrieben" of the verb phrase do not occur consecutively in the sentence but disconnected, or: discontinuous. Also the phenomenon of cross-serial dependencies in Dutch leads to discontinuous parse trees. For instance, in the sentence "omdat ik Peter Cecilia de nijlpaarden zag helpen voeren" the two words "ik" and "zag" belong together and, e.,g, have to be inflected simultaneously, but they are placed discontinuously in the sentence; the same applies to the word pairs "Peter-helpen" and "Cecilia-voeren". In this short research project I would like to introduce a new grammar formalism, called hybrid grammars which generates discontinuous parse trees. Each rule of a hybrid grammar has a probability in order to calculate a ranking among all the possible parse trees of one given sentence. In particular, I want to find out to which extent hybrid grammars are an appropriate tool for natural language processing. For this the following questions arise: How can one extract rules for a hybrid grammar from a corpus? How can one train probabilities to the rules? How efficient is a parser? The theoretical questions should be accompanied by practical implementations.

自然语言句子的句法结构通常由上下文无关文法给出。这个语法可以用来分析一个给定的句子，并建立它的分析树。每个解析树在以下意义上是连续的：对于每个节点，以下成立：边界，即，如果i小于j，则第i个子树的从左到右的叶标签的连接被放置在第j个子树的边界的左侧。对于具有相对自由的单词排序的语言（例如，德语和荷兰语），也会出现不连续的解析树。例如，在句子“Sie hat oft geschrieben”中，动词短语中的两个词“hat”和“geschrieben”在句子中不连续出现，而是不连续的，或者：不连续的。荷兰语中的跨序列依赖现象也导致了不连续的解析树。例如，在句子“omdat ik Peter Cecilia de nijlpaarden zag helpen voeren”中，“ik”和“zag”这两个词是一起的，g，必须同时变化，但它们在句子中不连续地放置;这同样适用于单词对“Peter-helpen”和“Cecilia-voeren”。在这个简短的研究项目中，我想介绍一种新的语法形式主义，称为混合语法，它生成不连续的解析树。混合语法的每个规则都有一个概率，以便计算一个给定句子的所有可能的解析树之间的排名。特别是，我想知道混合语法在多大程度上是自然语言处理的合适工具。为此，出现了以下问题：如何从语料库中提取混合语法的规则？如何训练概率来符合规则？解析器的效率如何？理论问题应该伴随着实际的实施。