权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Audiovisual Distinctive-Feature-Based Recognition of Dysarthric Speech

基于视听特征的构音障碍语音识别

基本信息

批准号：
0534106
负责人：
Mark Hasegawa-Johnson
金额：
--
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2005
资助国家：
美国
起止时间：
2005-11-15 至 2009-10-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0534106&HistoricalAwards=false
关键词：
Audiovisual Distinctive Feature Based Recognition

项目摘要

Automatic dictation software with reasonably high word recognition accuracy is now widely available to the general public. Many people with gross motor impairments, including some people with cerebral palsy and closed head injuries, have not enjoyed the benefit of these advances, however, because their general motor impairment includes a component of dysarthria, that is to say reduced speech intelligibility caused by neuro-motor impairment, while the motor impairment often precludes normal use of a keyboard. For this reason, dysarthric users often now find it easier to use a small-vocabulary automatic speech recognition system, with code words representing letters and formatting commands, and with acoustic speech recognition models carefully adapted to the speech of the individual user. But development of such individualized speech recognition systems remains extremely labor-intensive, because so little is understood about the general characteristics of dysarthric speech. In this project, the PI will study the general audio and visual characteristics of articulation errors in dysarthric speech, and apply the results to the development of speaker-independent large-vocabulary and small-vocabulary audio and audiovisual dysarthric speech recognition systems. More specifically, the PI will research word-based, phone-based, and phonologic-feature-based audio and audiovisual speech recognition models for both small-vocabulary and large-vocabulary speech recognizers designed for unrestricted text entry on a personal computer. The models will be based on audio and video analysis of phonetically balanced speech samples from a group of speakers with dysarthria, categorized into the following four groups: very low intelligibility (0-25% intelligibility, as rated by human listeners), low intelligibility (25-50%), moderate intelligibility (50-75%), and high intelligibility (75-100%). Interactive phonetic analysis will seek to describe the talker-dependent characteristics of articulation error in dysarthria; based on analysis of preliminary data, the PI hypothesizes that manner of articulation errors, place of articulation errors, and voicing errors are approximately independent events. Preliminary experiments also suggest that different dysarthric users will require dramatically different speech recognition architectures, because the symptoms of dysarthria vary so much from subject to subject, so the PI will develop and test at least three categories of audio-only and audiovisual speech recognition algorithms for dysarthric users: phone-based and whole-word recognizers using hidden Markov models (HMMs), phonologic-feature-based and whole-word recognizers using support vector machines (SVMs), and hybrid SVM-HMM recognizers. The models will be evaluated to determine overall recognition accuracy of each algorithm, changes in accuracy due to learning, group differences in accuracy due to severity of dysarthria, and dependence of accuracy on vocabulary size.Broader Impacts: This research will lay the foundation for constructing a speech recognition tool for practical use by computer users with neuro-motor disabilities. Tools and data developed in this project will all be released open-source, and will be designed so they can be easily ported to an open-source audiovisual speech recognition system for dysarthric users. The work may also have applicability beyond the target community, in that project outcomes may be relevant to many other populations (e.g., people with foreign accents) who have trouble training current ASR systems.

具有相当高的单词识别准确度的自动听写软件现在被广泛地提供给公众。然而，许多患有严重运动障碍的人，包括一些患有脑瘫和闭合性头部损伤的人，并没有享受到这些进步的好处，因为他们的一般运动障碍包括构音障碍的一部分，也就是说，由神经运动障碍引起的言语清晰度降低，而运动障碍通常妨碍键盘的正常使用。由于这个原因，构音障碍的用户现在经常发现使用小词汇量的自动语音识别系统更容易，用代码词表示字母和格式化命令，并使用仔细适应个人用户语音的声学语音识别模型。但是这种个性化语音识别系统的开发仍然是非常劳动密集型的，因为对构音障碍语音的一般特征知之甚少。在这个项目中，PI将研究构音障碍语音中发音错误的一般音频和视觉特征，并将结果应用于独立于说话者的大词汇量和小词汇量音频和视听构音障碍语音识别系统的开发。更具体地说，PI将研究基于单词，基于音素和基于语音特征的音频和视听语音识别模型，用于小词汇量和大词汇量语音识别器，用于个人计算机上的无限制文本输入。这些模型将基于对来自一组构音障碍患者的语音平衡语音样本的音频和视频分析，分为以下四组：极低清晰度（0-25%清晰度，由人类听众评定），低清晰度（25-50%），中等清晰度（50-75%）和高清晰度（75-100%）。交互式语音分析将试图描述构音障碍中发音错误的说话者依赖特征;基于对初步数据的分析，PI假设发音错误的方式、发音错误的位置和发音错误是近似独立的事件。初步实验还表明，不同的构音障碍用户将需要显着不同的语音识别架构，因为构音障碍的症状在不同的受试者之间差异很大，因此PI将为构音障碍用户开发和测试至少三类仅音频和视听语音识别算法：使用隐马尔可夫模型（HMM）的基于音素的和全词识别器、使用支持向量机（SVM）的基于语音特征的和全词识别器以及混合SVM-HMM识别器。该模型将进行评估，以确定整体识别精度的每个算法，准确性的变化，由于学习，由于构音障碍的严重程度，准确性的组差异，和依赖词汇大小的准确性。更广泛的影响：这项研究将奠定基础，构建一个语音识别工具的实际使用的计算机用户与神经运动障碍。在这个项目中开发的工具和数据都将开源发布，并将被设计为可以很容易地移植到一个开源的视听语音识别系统，为构音障碍的用户。这项工作还可能适用于目标社区以外的地方，因为项目成果可能与许多其他人群有关（例如，有外国口音的人）在训练当前的ASR系统方面有困难。