权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Automatic evaluation of speech quality

语音质量自动评估

基本信息

批准号：
6882741
负责人：
CHARLES S WATSON
金额：
$ 10万
依托单位：
COMMUNICATION DISORDERS TECHNOLOGY, INC
依托单位国家：
美国
项目类别：
财政年份：
2004
资助国家：
美国
起止时间：
2004-09-24 至 2005-09-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/6882741
关键词：
clinical research hearing human subject nonEnglish language sound perception speech technology /technique development vocabulary young adult human (21-34)

项目摘要

DESCRIPTION (provided by applicant): Tests of several different approaches to the automatic evaluation of the quality of speech segments are proposed. Previous systems for use in pronunciation training have typically employed either automatic speech-recognition (ASR) technology, or have used templates based on a limited number of utterances rated as excellent by L1 listeners (and sometimes also employing a second set of utterances containing a common pronunciation error). Here speech-processing technologies (HMM's and ANN's) will be developed specifically for use as evaluation systems (not recognition systems) to predict quality and locus-of-error judgments assigned by listeners. Termed the "evaluation-of-single-words" (ESW) approach, the special feature of these systems will derive from the training tokens employed in their development: multiple recordings of a single word made by groups of native and non-native talkers. Sixty talkers will be native speakers of Arabic, whose intelligibility in English ranges from poor to near-perfect, and 60 talkers will be native speakers of middle-American English. There will be twelve words divided between one, two, and three syllables. Ten productions of each word will be recorded by each talker, yielding 14,400 tokens. Each token will be rated by listening juries for pronunciation quality, and the tokens will also be categorized into perceptual clusters, using MDS and cluster-analysis techniques. At least two computer-based evaluation systems (HMM and ANN) will be trained for each individual word, with the goals of predicting overall pronunciation quality and identifying specific commonly occurring pronunciation errors. It is expected that these word-specific systems, each representing a discrete "evaluator" custom-built for an individual word, will approach the maximum accuracy that can be expected of this class of processors. If successful, the ESW approach may have a broad range of applications in pronunciation training, identification of a speaker's L1, foreign-language instruction, and other non-lexical applications. However, our specific goal is the development of systems that can provide informative feedback during automated pronunciation training. In ASR applications, the goal is to respond the same way to a word, no matter how it is pronounced. The goal of an ESW system is to respond differentially to pronunciation variants. This distinction between ASR and ESW is central to the development of successful evaluation systems as it dictates different modeling constraints.

描述(由申请人提供)：提出了对语音段质量的自动评估的几种不同方法的测试。用于发音训练的先前系统通常采用自动语音识别(ASR)技术，或者使用基于被L1听者评为优秀的有限数量的发音的模板(并且有时还使用包含共同发音错误的第二组发音)。在这里，语音处理技术(HMM和ANN)将专门作为评估系统(而不是识别系统)来开发，以预测听者分配的质量和错误位置判断。这些系统的特殊特征被称为“单字评估”(ESW)方法，其特点来自于在其开发过程中采用的训练标志：由一组母语和非母语说话者对一个单词进行多次录音。60名演讲者将以阿拉伯语为母语，英语的可理解性从差到近乎完美，60名演讲者将以中美英语为母语。将有12个单词分为单音节、双音节和三音节。每个说话者将录制每个单词的十个作品，产生14,400个代币。每个令牌都将由听音评审团根据发音质量进行评级，并使用MDS和聚类分析技术将这些令牌归类为感知簇。将为每个单词培训至少两个基于计算机的评估系统(HMM和ANN)，目的是预测总体发音质量并确定具体的常见发音错误。预计这些特定于单词的系统将接近这类处理器所能期望的最大精度，每个系统都代表一个为单个单词定制的离散“评估器”。如果成功，ESW方法可能会在发音训练、识别说话人的母语、外语教学和其他非词汇应用方面有广泛的应用。然而，我们的具体目标是开发能够在自动发音训练期间提供信息反馈的系统。在ASR应用程序中，目标是对一个单词做出相同的反应，无论它是如何发音的。ESW系统的目标是对发音变体做出不同的反应。ASR和ESW之间的这种区别是开发成功的评估系统的核心，因为它规定了不同的建模约束。