Models of morphosyntax for statistical machine translation
统计机器翻译的形态句法模型
基本信息
- 批准号:123083856
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2009
- 资助国家:德国
- 起止时间:2008-12-31 至 2017-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Statistical approaches to machine translation (MT) have shownthemselves to be effective in the last few years. However, whentranslating into a morphologically rich language this is not true,particularly when there is also significant syntactic divergencebetween the two languages. The quality of statistical machinetranslation (SMT) is poor in this case because of independenceassumptions made between the models of morphology, syntax andtranslation that do not reflect linguistic reality.In the first phase of the project we made significant strides intranslating into German, a morphologically rich language. We focusedon issues of linguistic representation and linguistic resources withinstatistical machine translation. We carried out original research indealing with German word formation (addressing both compounds andportmanteaus), German inflectional morphology and syntactic issues indealing with both English to German translation and German to Englishtranslation. We published seven conference publications at top rankedinternational conferences as well as two workshop contributions, andalso supervised Bachelors-level and Masters-level student workrelevant to the project.In the proposed phase 2, we will move on from afocus on linguistic representation to working on advanced machinelearning approaches for solving the linguistic problems inherent inthe difficult machine translation language pair English/German. In theprevious phase of the work, we focused on general linguistic problemsin translation. One of the most important lessons we learned inanalyzing the output of our linguistically enhanced systems is thatthe issue of the mismatch of domains between training data and testdata is a critically important issue. The training data is mostlytaken from the European parliament proceedings, but the testing datais from the news domain or many other domains (including the medicaldomain, which we will study in this phase of the project).In the proposed phase 2 of the Morphosyntax project, we will followfour main lines of work. We will extend our successful work on Germanword formation and inflectional morphology by reducing our dependenceon hand-crafted morphological resources by determining how to performsemi-supervised acquisition of morphological resources, payingparticular attention to the important issue of domainadaptation. Within hierarchical decoding (wheresyntactic formalisms are used as the representation for translation),we will study the integration of advanced machine learning methods forchoosing syntactic reorderings. We will also study the issue of addingsemantic information (in addition to syntactic information) intohierarchical decoding. Finally, we will study different ways tointegrate powerful classification approaches (required for the otherwork packages) directly into the decoder rather than using externalpre-processing or post-processing.
近年来,统计方法在机器翻译中的应用取得了显著的成效.然而,当翻译成一种形态丰富的语言时,这是不正确的,特别是当两种语言之间也存在显着的句法差异时。统计机器翻译(SMT)的质量很差,因为词法、句法和翻译模型之间的独立假设不能反映语言的真实情况。在项目的第一阶段,我们在翻译成德语方面取得了重大进展,德语是一种形态丰富的语言。我们关注的是统计机器翻译中的语言表示和语言资源问题。我们进行了原创性的研究,涉及德语的构词法(包括复合词和复合词),德语的屈折形态和句法问题,涉及英语到德语的翻译和德语到英语的翻译。我们在顶级国际会议上发表了七份会议出版物,以及两个研讨会的贡献,并指导了与该项目相关的学士和硕士学生的工作。在拟议的第二阶段,我们将从语言表示的焦点转向解决困难的机器翻译语言对英语/德语中固有的语言问题的先进机器学习方法。在前一阶段的工作中,我们着重于翻译中的一般语言问题。我们在分析我们的语言增强系统的输出时学到的最重要的教训之一是,训练数据和测试数据之间的域不匹配问题是一个至关重要的问题。训练数据主要来自欧洲议会的会议记录,而测试数据则来自新闻领域或其他领域(包括我们将在本阶段研究的医疗领域)。在Morphosyntax项目的第二阶段,我们将遵循四条主要工作路线。我们将通过确定如何执行形态资源的半监督获取来减少我们对手工制作的形态资源的依赖,从而扩展我们在德语单词形成和屈折形态学上的成功工作,特别注意领域适应的重要问题。在分层解码(其中语法形式主义被用作翻译的表示)中,我们将研究高级机器学习方法的集成,以选择语法重新排序。我们还将研究将语义信息(除了句法信息)加入到分层解码中的问题。最后,我们将研究不同的方法,将强大的分类方法(其他工作包所需的)直接集成到解码器中,而不是使用外部预处理或后处理。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Hinrich Schütze其他文献
Professor Dr. Hinrich Schütze的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Hinrich Schütze', 18)}}的其他基金
ReMLAV: Relational Machine Learning for Argument Validation
ReMLAV:用于参数验证的关系机器学习
- 批准号:
376183703 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Priority Programmes
FADeBaC Sentiment Analysis - Fully Automatic DEnsity-BAsed Clustering applied to Sentiment Analysis
FADeBaC 情感分析 - 应用于情感分析的全自动基于密度的聚类
- 批准号:
219327280 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Research Grants
WordGraph - Development of a unified graph-theoretical system for acquiring lexico-semantic phenomena
WordGraph - 开发用于获取词汇语义现象的统一图论系统
- 批准号:
42840215 - 财政年份:2007
- 资助金额:
-- - 项目类别:
Research Grants
Pretrained-Language-Model-Enabled Active Learning
预训练语言模型支持主动学习
- 批准号:
507660870 - 财政年份:
- 资助金额:
-- - 项目类别:
Research Grants
相似海外基金
A Historical Linguistic Study to Establish "Southern Romance Languages" Based on Areal Features of Morphosyntax
基于形态句法地域特征建立“南方罗曼语言”的历史语言学研究
- 批准号:
23K00514 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
A study of nDrapa morphosyntax
nDrapa形态句法的研究
- 批准号:
23K00476 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
The morphosyntax of Mada, an endangered Niger Congo language of North-Central Nigeria
马达语的形态句法,尼日利亚中北部濒临灭绝的尼日尔刚果语
- 批准号:
2753580 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Studentship
Words, phrases, and sentences at the interface of phonology and morphosyntax
音韵学和形态句法界面上的单词、短语和句子
- 批准号:
2041248 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Continuing Grant
Production and Comprehension of Morphosyntax among Children with and without Developmental Language Disorder who speak non-Mainstream American English dialects
使用非主流美式英语方言的患有和不患有发育性语言障碍的儿童的形态句法的产生和理解
- 批准号:
10407444 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Production and Comprehension of Morphosyntax among Children with and without Developmental Language Disorder who speak non-Mainstream American English dialects
使用非主流美式英语方言的患有和不患有发育性语言障碍的儿童的形态句法的产生和理解
- 批准号:
10058835 - 财政年份:2020
- 资助金额:
-- - 项目类别:
The Morphosyntax of RARE and Potential Predicates in Tohoku Dilalects
东北方言中罕见谓语的形态句法和潜在谓语
- 批准号:
19K00554 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
The analysis of suspended affixation: coordination, morphosyntax and the architecture of the language faculty
悬置词缀分析:协调、形态句法和语言能力的结构
- 批准号:
2241670 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Studentship
A diachronic and typological study of Sardinian from a point of view of morphosyntax in undocumented dialects
从未记录方言的形态句法角度对撒丁岛语进行历时和类型学研究
- 批准号:
19K00563 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Kobani variety of Kurmanji: Variation in morphosyntax
库尔曼吉语的科巴尼变体:形态句法的变化
- 批准号:
2282814 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Studentship