权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative: Discriminative Knowledge-Rich Language Modeling for Machine Translation

协作：用于机器翻译的判别性知识丰富的语言建模

基本信息

批准号：
0712810
负责人：
Rebecca Hwa
金额：
--
依托单位：
University of Pittsburgh
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2007
资助国家：
美国
起止时间：
2007-09-01 至 2010-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0712810&HistoricalAwards=false
关键词：
Collaborative Discriminative Knowledge Rich Language

项目摘要

This project investigates a novel approach for assessing the fluency andgrammaticality of alternative translation hypotheses that are created withinsearch-based Machine Translation (MT) systems. This task, commonly termed"Language Modeling" (LM), has been explored primarily in the context of speechrecognition; however, current state-of-the-art language models (LMs) are noteffective at distinguishing between more fluent grammatical translations andtheir poor alternatives. In contrast, the proposed approach, "DiscriminativeKnowledge-Rich Language Modeling" (DKRLM), is explicitly designed to find themost fluent and grammatical translations within the search space by comparingthe linguistic features of the translation hypotheses against very large"clean" monolingual corpora. The intuition is that more grammaticaltranslation hypotheses should contain higher proportions of features seen inthe large corpora. An important contribution of the project is in exploringdifferent types of linguistic features to identify those that are mostinformative for the comparisons. Moreover, discriminative training isperformed to incorporate the features into a system-independent scoringfunction, replacing traditional LMs in MT systems. The broader impacts of theproposed work include both broader adoption for the methodology as well aswider use of the new DKRLM functions to other search-based NLP applicationsthat aim at generating fluent grammatical text. This includes search-basedapproaches to Speech Recognition, Natural Language Generation (NLG), OpticalCharacter Recognition (OCR), Summarization, and others.

本项目研究了一种评估基于搜索的机器翻译（MT）系统中创建的替代翻译假设的流畅性和语法性的新方法。这项任务通常被称为“语言建模”（LM），主要在语音识别的背景下进行了探索；然而，目前最先进的语言模型（LMs）在区分语法更流畅的翻译和语法差的翻译方面效果不佳。相比之下，提出的方法“判别知识丰富的语言建模”（DKRLM）明确设计为通过比较翻译假设的语言特征与非常大的“干净”单语语料库，在搜索空间中找到最流畅和最符合语法的翻译。直觉是，更多的语法翻译假设应该包含在大型语料库中看到的更高比例的特征。该项目的一个重要贡献是探索不同类型的语言特征，以确定那些最适合比较的信息。此外，进行判别训练以将特征合并到系统独立的评分函数中，取代机器翻译系统中的传统LMs。所提议的工作的更广泛的影响包括更广泛地采用该方法，以及更广泛地将新的DKRLM功能用于其他旨在生成流利语法文本的基于搜索的NLP应用程序。这包括基于搜索的语音识别方法，自然语言生成（NLG），光学字符识别（OCR），摘要等。