权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Semi-Supervised Discriminative Training of Language Models

RI：媒介：协作研究：语言模型的半监督判别训练

基本信息

批准号：
0963898
负责人：
Sanjeev Khudanpur
金额：
$ 50万
依托单位：
Johns Hopkins University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2010
资助国家：
美国
起止时间：
2010-06-01 至 2015-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0963898&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Semi

项目摘要

This project is conducting fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT). A language model (LM) is conventionally optimized, using text in the target language, to assign high probability to well-formed sentences. This method has a fundamental shortcoming: the optimization does not explicitly target the kinds of distinctions necessary to accomplish the task at hand, such as discriminating (for ASR) between different words that are acoustically confusable or (for MT) between different target-language words that express the multiple meanings of a polysemous source-language word. Discriminative optimization of the LM, which would overcome this shortcoming, requires large quantities of paired input-output sequences: speech and its reference transcription for ASR or source-language (e.g. Chinese) sentences and their translations into the target language (say, English) for MT. Such resources are expensive, and limit the efficacy of discriminative training methods. In a radical departure from convention, this project is investigating discriminative training using easily available, *unpaired* input and output sequences: un-transcribed speech or monolingual source-language text and unpaired target-language text. Two key ideas are being pursued: (i) unlabeled input sequences (e.g. speech or Chinese text) are processed to learn likely confusions encountered by the ASR or MT system; (ii) unpaired output sequences (English text) are leveraged to discriminate between these well-formed sentences from the (supposed) ill-formed sentences the system could potentially confuse them with. This self-supervised discriminative training, if successful, will advance machine intelligence in fundamental ways that impact many other applications.

该项目正在进行统计语言建模的基础研究，以改进人类语言技术，包括自动语音识别（ASR）和机器翻译（MT）。语言模型（LM）通常使用目标语言中的文本进行优化，以将高概率分配给结构良好的句子。这种方法有一个根本的缺点：优化没有明确地针对完成手头任务所必需的各种区别，例如区分（对于ASR）在声学上容易混淆的不同单词之间，或者（对于MT）在表达多义词源语言单词的多个含义的不同目标语言单词之间。为了克服这一缺点，LM的判别优化需要大量成对的输入输出序列：用于ASR的语音及其参考转录，或用于机器翻译的源语言（如汉语）句子及其翻译成目标语言（如英语）。这些资源昂贵，并且限制了判别训练方法的有效性。与传统截然不同的是，该项目正在研究使用容易获得的、*未配对的*输入和输出序列的判别训练：未转录的语音或单语源语言文本和未配对的目标语言文本。目前正在研究的两个关键思想是：(i)处理未标记的输入序列（例如语音或中文文本），以学习ASR或MT系统可能遇到的混淆；（ii）利用未配对的输出序列（英语文本）来区分这些格式良好的句子和系统可能混淆的（假定的）格式不良的句子。这种自我监督的判别训练如果成功，将从根本上推动机器智能的发展，影响许多其他应用。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sanjeev Khudanpur其他文献

DOI：
10.1016/j.csl.2015.08.007
发表时间：
2016-03-01
期刊：
Research article
影响因子：
作者：
Scott Novotney;Richard Schwartz;Sanjeev Khudanpur
通讯作者：
Sanjeev Khudanpur

A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data

DOI：
10.1016/j.csl.2022.101410
发表时间：
2023-01-01
期刊：
Research article
影响因子：
作者：
Matthew Maciejewski;Jing Shi;Shinji Watanabe;Sanjeev Khudanpur
通讯作者：
Sanjeev Khudanpur

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial workshop

走向知道何时不知道的机器：2014 年 Frederick Jelinek 纪念研讨会所做工作总结

DOI：
发表时间：
2015
期刊：
Proc. ICASSP2015
影响因子：
0
作者：
Hynek Hermansky;Lukas Burget;Jordan Cohen;Emmanuel Dupoux Naomi Feldman;John Godfrey;Sanjeev Khudanpur;Matthew Maciejewski;Sri Harish Mallidi;Anjali Menon;Tetsuji Ogawa;Vijayaditya Peddinti;Richard Rose;Richard Stern;Matthew Wiesner;Karel Ve
通讯作者：
Karel Ve