权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Modeling Pronunciation Variation for Universal Access to Speech Understanding

为普遍获得语音理解而建模发音变化

基本信息

批准号：
9978025
负责人：
Daniel Jurafsky
金额：
$ 50.4万
依托单位：
University of Colorado at Boulder
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
1999
资助国家：
美国
起止时间：
1999-09-15 至 2004-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9978025&HistoricalAwards=false
关键词：
Modeling Pronunciation Variation Universal Access

项目摘要

In order for speech-based interfaces to be universally accessible - that is, available to all citizens in all situations - they must deal successfully with massive variability in pronunciation, as studies have shown that this is by far the single largest source of error in machine speech recognition. This project will address that issue by building dynamic pronunciation models which use context to predict the most likely pronunciations of a word in a particular utterance. The PIs' previous work has established that speakers are more likely to delete certain sounds (e.g., the final "t" in "about") in a variety of situations: if the next word starts with certain consonants; if they are speaking quickly; if the word is unsurprising (has a high trigram probability); if they are young and male; if the word is not followed by a pause or a word-repetition; or if they are speakers of certain dialects. They have further shown how confidence metrics can be used to identify inaccurate dictionary pronunciations based on actual pronunciations in speech. This project will extend and combine both of these lines of research, by using phonetic-confidence models to identify words with incorrect pronunciations that contribute to word-error, and then building statistical (decision-tree) models of pronunciation variation for these words. The PIs' model provides a general engine which they will apply to several specific classes of pronunciation variation, including dialect and speaker type, embedded in the Sphinx-II speech recognizer. The application of computational models to the study of pronunciation variation is an important new research area in linguistics and cognitive modeling, with implications for and potential benefits to many areas of science and pedagogy, including universal access.

为了使基于语音的界面具有普遍可访问性--也就是说，在所有情况下都可供所有公民使用--它们必须成功地处理发音的巨大差异，因为研究表明，这是迄今为止机器语音识别中最大的错误来源。这个项目将通过建立动态发音模型来解决这个问题，该模型使用上下文来预测特定话语中一个词最可能的发音。 PI之前的工作已经确定，说话者更有可能删除某些声音（例如，“about”中的最后一个“t”）在各种情况下：如果下一个单词以某些辅音开头; 如果他们说话很快;如果这个词并不令人惊讶（具有很高的三元组概率）;如果他们是年轻的男性; 如果这个词后面没有停顿或重复;或者如果他们说某些方言。他们还进一步展示了如何使用置信度来根据语音中的实际发音识别不准确的字典发音。这个项目将扩展和联合收割机这两条线的研究，通过使用语音置信度模型来识别单词与不正确的发音，有助于单词错误，然后建立统计（决策树）模型的发音变化，这些话。 PI的模型提供了一个通用的引擎，它们将应用于几个特定类别的发音变化，包括方言和扬声器类型，嵌入在Sphinx-II语音识别器。将计算模型应用于发音变异的研究是语言学和认知建模领域的一个重要的新研究领域，对科学和教育学的许多领域都有影响和潜在的好处，包括普遍获得。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Daniel Jurafsky其他文献

ReFT: Representation Finetuning for Language Models

ReFT：语言模型的表示微调

DOI：
发表时间：
2024
期刊：
arXiv.org
影响因子：
0
作者：
Zhengxuan Wu;Aryaman Arora;Zheng Wang;Atticus Geiger;Daniel Jurafsky;Christopher D. Manning;Christopher Potts
通讯作者：
Christopher Potts