权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Collaborative Research: Discriminative Knowledge-Rich Language Modeling for Machine Translation

RI：协作研究：用于机器翻译的判别性知识丰富的语言建模

基本信息

批准号：
0713402
负责人：
Alon Lavie
金额：
$ 32.52万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2007
资助国家：
美国
起止时间：
2007-09-01 至 2012-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0713402&HistoricalAwards=false
关键词：
RI Collaborative Research Discriminative Knowledge

项目摘要

This project investigates a novel approach for assessing the fluency andgrammaticality of alternative translation hypotheses that are created withinsearch-based Machine Translation (MT) systems. This task, commonly termed"Language Modeling" (LM), has been explored primarily in the context of speechrecognition; however, current state-of-the-art language models (LMs) are noteffective at distinguishing between more fluent grammatical translations andtheir poor alternatives. In contrast, the proposed approach, "DiscriminativeKnowledge-Rich Language Modeling" (DKRLM), is explicitly designed to find themost fluent and grammatical translations within the search space by comparingthe linguistic features of the translation hypotheses against very large"clean" monolingual corpora. The intuition is that more grammaticaltranslation hypotheses should contain higher proportions of features seen inthe large corpora. An important contribution of the project is in exploringdifferent types of linguistic features to identify those that are mostinformative for the comparisons. Moreover, discriminative training isperformed to incorporate the features into a system-independent scoringfunction, replacing traditional LMs in MT systems. The broader impacts of theproposed work include both broader adoption for the methodology as well aswider use of the new DKRLM functions to other search-based NLP applicationsthat aim at generating fluent grammatical text. This includes search-basedapproaches to Speech Recognition, Natural Language Generation (NLG), OpticalCharacter Recognition (OCR), Summarization, and others.

这个项目研究了一种新的方法来评估在基于搜索的机器翻译(MT)系统中创建的可选翻译假设的流畅性和语法。这项任务通常被称为语言建模(Language Model，LM)，主要是在语音识别的背景下进行探索；然而，当前最先进的语言模型(LMS)在区分更流畅的语法翻译和糟糕的替代方案方面非常有效。相反，所提出的“富知识语言模型”(DKRLM)的目的是通过将翻译假设的语言特征与非常大的“干净的”单语语料库进行比较，在搜索空间中找到最流利和最有语法意义的翻译。直觉是，更多的语法化翻译假设应该包含在大型语料库中看到的更高比例的特征。该项目的一个重要贡献是探索不同类型的语言特征，以确定哪些特征对比较最有信息量。此外，还进行了区分训练，将这些特征融入到与系统无关的评分函数中，取代了机器翻译系统中传统的最小二乘法。拟议工作的更广泛影响包括更广泛地采用这一方法，以及更广泛地将新的DKRLM功能用于其他旨在生成流畅语法文本的基于搜索的自然语言处理应用程序。这包括基于搜索的语音识别、自然语言生成(NLG)、光学字符识别(OCR)、摘要等方法。