权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Matching Non-Native Transcribers to the Distinctive Features of the Language Transcribed

EAGER：将非母语转录者与转录语言的独特特征相匹配

基本信息

批准号：
1550145
负责人：
Mark Hasegawa-Johnson
金额：
$ 15万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-08-01 至 2018-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1550145&HistoricalAwards=false
关键词：
EAGER Matching Non Native Transcribers

项目摘要

Automatic speech recognition (ASR) systems must be trained using hundreds of hours of speech, with synchronized text transcriptions. Transcribing that much speech is beyond the means of most language communities; therefore ASR systems do not exist for most languages. To overcome this bottleneck, this exploratory EAGER project asks people who don't understand a particular language to transcribe it as if they were listening to nonsense syllables. Of course, when people try to transcribe speech in a language they don't understand, they make mistakes. However there are patterns to those mistakes which can be modeled using decoding strategies developed for telephone and wireless communication, and used to route each transcription task to people whose native language helps them to perform it. The resulting transcriptions are then fused in order to recover correct transcriptions. Five different languages are to be tested, including languages with lexical tone, and languages with a variety of consonant contrasts very different from English. The resulting transcriptions can then train ASR systems in all five languages, and the quality of the research evaluated based on its ability to train those systems without using transcriptions produced by native speakers. Mismatched crowdsourcing is formalized as a noisy channel; the talker encodes meaning in a string of symbols (phonemes) not all of which are reliably distinguishable by the perceiver. Models of second-language speech perception for each transcriber can be initialized using a perceptual assimilation model, then specialized. In particular, this proposal seeks increases in the scale and robustness of mismatched crowdsourcing by using error-correcting codes to divide the transcription task, and by then distributing each sub-task to transcribers whose native language contains the distinctive feature requested. It also seeks to develop new theory at the intersection of the current fields of crowdsourcing (the learnability of a function under conditions of label noise) and grammar induction (the learnability of a function from one language to another), and to perform grammar induction under conditions of label noise. Preliminary bounds exist for some aspects of this problem; the proposed research is designed to develop more detailed theoretic results, and test and apply them to determine the feasibility of creating serviceable ASR systems for under-resourced languages without having to use fluent speakers of those languages to transcribe speech in those languages.

自动语音识别（ASR）系统必须使用数百小时的语音进行训练，并同步文本传输。大多数语言社区都无法转录这么多的语音，因此大多数语言都不存在ASR系统。为了克服这个瓶颈，EAGER这个探索性的项目要求不懂某种语言的人把它转录下来，就好像他们在听无意义的音节一样。当然，当人们试图用他们不理解的语言转录语音时，他们会犯错误。然而，这些错误有模式，可以使用为电话和无线通信开发的解码策略来建模，并用于将每个转录任务路由到母语帮助他们执行该任务的人。五种不同的语言将被测试，包括词汇语气的语言，以及与英语非常不同的各种辅音对比的语言。由此产生的transmittance可以用所有五种语言训练ASR系统，并且研究的质量基于其在不使用母语者产生的transmittance的情况下训练这些系统的能力进行评估。不匹配的众包被形式化为一个嘈杂的通道;说话者将意义编码在一串符号（音素）中，并非所有符号都能被感知者可靠地区分。第二语言的语音感知模型为每个转录器可以使用感知同化模型初始化，然后专门化。特别是，该提案寻求通过使用纠错码来划分转录任务，然后将每个子任务分配给其母语包含所请求的独特特征的转录者，来增加不匹配的众包的规模和鲁棒性。它还寻求在当前众包（标签噪声条件下函数的可学习性）和语法归纳（从一种语言到另一种语言的函数的可学习性）领域的交叉点上开发新理论，并在标签噪声条件下执行语法归纳。该问题的某些方面存在初步界限;拟议的研究旨在开发更详细的理论结果，并测试和应用它们，以确定为资源不足的语言创建可用的ASR系统的可行性，而不必使用这些语言的流利使用者来转录这些语言的语音。