Hybrid grammars for discontinuous phrase structure trees
不连续短语结构树的混合语法
基本信息
- 批准号:255344147
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2014
- 资助国家:德国
- 起止时间:2013-12-31 至 2014-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The syntactic structure of sentences of a natural language is usually given by a context-free grammar. This grammar can be used to parse a given sentence and to build its parse tree. Each parse tree is continuous in the sense that for each node the following holds: the frontier, i.e., the concatenation of the leaf labels from left to right, of the i-th subtree is placed to the left of the frontier of the j-th subtree provided i is smaller than j. For languages with relative free word ordering (e.g. German and Dutch) also discontinuous parse trees occur. For instance, in the sentence "Sie hat oft geschrieben" the two words "hat" and "geschrieben" of the verb phrase do not occur consecutively in the sentence but disconnected, or: discontinuous. Also the phenomenon of cross-serial dependencies in Dutch leads to discontinuous parse trees. For instance, in the sentence "omdat ik Peter Cecilia de nijlpaarden zag helpen voeren" the two words "ik" and "zag" belong together and, e.,g, have to be inflected simultaneously, but they are placed discontinuously in the sentence; the same applies to the word pairs "Peter-helpen" and "Cecilia-voeren". In this short research project I would like to introduce a new grammar formalism, called hybrid grammars which generates discontinuous parse trees. Each rule of a hybrid grammar has a probability in order to calculate a ranking among all the possible parse trees of one given sentence. In particular, I want to find out to which extent hybrid grammars are an appropriate tool for natural language processing. For this the following questions arise: How can one extract rules for a hybrid grammar from a corpus? How can one train probabilities to the rules? How efficient is a parser? The theoretical questions should be accompanied by practical implementations.
自然语言句子的句法结构通常由上下文无关文法给出。这个语法可以用来分析一个给定的句子,并建立它的分析树。每个解析树在以下意义上是连续的:对于每个节点,以下成立:边界,即,如果i小于j,则第i个子树的从左到右的叶标签的连接被放置在第j个子树的边界的左侧。对于具有相对自由的单词排序的语言(例如,德语和荷兰语),也会出现不连续的解析树。例如,在句子“Sie hat oft geschrieben”中,动词短语中的两个词“hat”和“geschrieben”在句子中不连续出现,而是不连续的,或者:不连续的。荷兰语中的跨序列依赖现象也导致了不连续的解析树。例如,在句子“omdat ik Peter Cecilia de nijlpaarden zag helpen voeren”中,“ik”和“zag”这两个词是一起的,g,必须同时变化,但它们在句子中不连续地放置;这同样适用于单词对“Peter-helpen”和“Cecilia-voeren”。 在这个简短的研究项目中,我想介绍一种新的语法形式主义,称为混合语法,它生成不连续的解析树。混合语法的每个规则都有一个概率,以便计算一个给定句子的所有可能的解析树之间的排名。特别是,我想知道混合语法在多大程度上是自然语言处理的合适工具。为此,出现了以下问题:如何从语料库中提取混合语法的规则?如何训练概率来符合规则?解析器的效率如何?理论问题应该伴随着实际的实施。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr.-Ing. Heiko Vogler其他文献
Professor Dr.-Ing. Heiko Vogler的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr.-Ing. Heiko Vogler', 18)}}的其他基金
Formale Modelle und Algorithmen zur syntaxbasierten maschinellen Übersetzung natürlicher Sprachen
基于语法的自然语言机器翻译的形式模型和算法
- 批准号:
198961575 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Research Grants
Gewichtete Baumübersetzer als formales Werkzeug für die Syntax-basierte maschinelle Übersetzung natürlicher Sprachen
加权树翻译器作为自然语言基于语法的机器翻译的正式工具
- 批准号:
142808156 - 财政年份:2009
- 资助金额:
-- - 项目类别:
Research Grants
Gewichtete Baumautomaten über Multioperator-Monoiden
多算子幺半群上的加权树自动机
- 批准号:
28443503 - 财政年份:2006
- 资助金额:
-- - 项目类别:
Research Grants
相似海外基金
The Emergence and Refinement of Grammars: perspectives from syntax and phonology
语法的出现和完善:句法和音韵学的视角
- 批准号:
2890509 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship
EAGER: Building Language Technologies by Machine Reading Grammars
EAGER:通过机器阅读语法构建语言技术
- 批准号:
2327143 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Doctoral Dissertation Research: How flexible are grammars past puberty? Evidence from heritage language returnees
博士论文研究:青春期过后语法的灵活性如何?
- 批准号:
2234698 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Using chemical thermodynamics on networks to understand the universality of biological sugar-phosphate metabolism
利用网络化学热力学来理解生物糖磷酸代谢的普遍性
- 批准号:
22K03792 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Algorithms and Inference of Grammars and Natural Computing Models
语法和自然计算模型的算法和推理
- 批准号:
RGPIN-2022-05092 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
MIM: Elucidating the Rules of Cooperation and Resiliency in Microbial Communities through Stochastic Graph Grammars
MIM:通过随机图语法阐明微生物群落的合作和弹性规则
- 批准号:
2125965 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Standard Grant
MIM: Elucidating the Rules of Cooperation and Resiliency in Microbial Communities through Stochastic Graph Grammars
MIM:通过随机图语法阐明微生物群落的合作和弹性规则
- 批准号:
2126387 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Standard Grant
CRII: III: Toward the Compression of Pangenomic DNA Sequence Data Using Context-Free Grammars
CRII:III:使用上下文无关语法压缩泛基因组 DNA 序列数据
- 批准号:
2105391 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Standard Grant
Vulnerable native grammars: the effects of limited input in native language attrition
脆弱的母语语法:有限输入对母语磨损的影响
- 批准号:
AH/T005157/1 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Research Grant
Integrating prosodic structure into computational grammars
将韵律结构整合到计算语法中
- 批准号:
447093200 - 财政年份:2020
- 资助金额:
-- - 项目类别:
WBP Fellowship