Audiovisual Distinctive-Feature-Based Recognition of Dysarthric Speech

基于视听特征的构音障碍语音识别

基本信息

项目摘要

Automatic dictation software with reasonably high word recognition accuracy is now widely available to the general public. Many people with gross motor impairments, including some people with cerebral palsy and closed head injuries, have not enjoyed the benefit of these advances, however, because their general motor impairment includes a component of dysarthria, that is to say reduced speech intelligibility caused by neuro-motor impairment, while the motor impairment often precludes normal use of a keyboard. For this reason, dysarthric users often now find it easier to use a small-vocabulary automatic speech recognition system, with code words representing letters and formatting commands, and with acoustic speech recognition models carefully adapted to the speech of the individual user. But development of such individualized speech recognition systems remains extremely labor-intensive, because so little is understood about the general characteristics of dysarthric speech. In this project, the PI will study the general audio and visual characteristics of articulation errors in dysarthric speech, and apply the results to the development of speaker-independent large-vocabulary and small-vocabulary audio and audiovisual dysarthric speech recognition systems. More specifically, the PI will research word-based, phone-based, and phonologic-feature-based audio and audiovisual speech recognition models for both small-vocabulary and large-vocabulary speech recognizers designed for unrestricted text entry on a personal computer. The models will be based on audio and video analysis of phonetically balanced speech samples from a group of speakers with dysarthria, categorized into the following four groups: very low intelligibility (0-25% intelligibility, as rated by human listeners), low intelligibility (25-50%), moderate intelligibility (50-75%), and high intelligibility (75-100%). Interactive phonetic analysis will seek to describe the talker-dependent characteristics of articulation error in dysarthria; based on analysis of preliminary data, the PI hypothesizes that manner of articulation errors, place of articulation errors, and voicing errors are approximately independent events. Preliminary experiments also suggest that different dysarthric users will require dramatically different speech recognition architectures, because the symptoms of dysarthria vary so much from subject to subject, so the PI will develop and test at least three categories of audio-only and audiovisual speech recognition algorithms for dysarthric users: phone-based and whole-word recognizers using hidden Markov models (HMMs), phonologic-feature-based and whole-word recognizers using support vector machines (SVMs), and hybrid SVM-HMM recognizers. The models will be evaluated to determine overall recognition accuracy of each algorithm, changes in accuracy due to learning, group differences in accuracy due to severity of dysarthria, and dependence of accuracy on vocabulary size.Broader Impacts: This research will lay the foundation for constructing a speech recognition tool for practical use by computer users with neuro-motor disabilities. Tools and data developed in this project will all be released open-source, and will be designed so they can be easily ported to an open-source audiovisual speech recognition system for dysarthric users. The work may also have applicability beyond the target community, in that project outcomes may be relevant to many other populations (e.g., people with foreign accents) who have trouble training current ASR systems.
具有相当高的单词识别准确度的自动听写软件现在被广泛地提供给公众。 然而,许多患有严重运动障碍的人,包括一些患有脑瘫和闭合性头部损伤的人,并没有享受到这些进步的好处,因为他们的一般运动障碍包括构音障碍的一部分,也就是说,由神经运动障碍引起的言语清晰度降低,而运动障碍通常妨碍键盘的正常使用。 由于这个原因,构音障碍的用户现在经常发现使用小词汇量的自动语音识别系统更容易,用代码词表示字母和格式化命令,并使用仔细适应个人用户语音的声学语音识别模型。 但是这种个性化语音识别系统的开发仍然是非常劳动密集型的,因为对构音障碍语音的一般特征知之甚少。 在这个项目中,PI将研究构音障碍语音中发音错误的一般音频和视觉特征,并将结果应用于独立于说话者的大词汇量和小词汇量音频和视听构音障碍语音识别系统的开发。 更具体地说,PI将研究基于单词,基于音素和基于语音特征的音频和视听语音识别模型,用于小词汇量和大词汇量语音识别器,用于个人计算机上的无限制文本输入。 这些模型将基于对来自一组构音障碍患者的语音平衡语音样本的音频和视频分析,分为以下四组:极低清晰度(0-25%清晰度,由人类听众评定),低清晰度(25-50%),中等清晰度(50-75%)和高清晰度(75-100%)。 交互式语音分析将试图描述构音障碍中发音错误的说话者依赖特征;基于对初步数据的分析,PI假设发音错误的方式、发音错误的位置和发音错误是近似独立的事件。 初步实验还表明,不同的构音障碍用户将需要显着不同的语音识别架构,因为构音障碍的症状在不同的受试者之间差异很大,因此PI将为构音障碍用户开发和测试至少三类仅音频和视听语音识别算法:使用隐马尔可夫模型(HMM)的基于音素的和全词识别器、使用支持向量机(SVM)的基于语音特征的和全词识别器以及混合SVM-HMM识别器。 该模型将进行评估,以确定整体识别精度的每个算法,准确性的变化,由于学习,由于构音障碍的严重程度,准确性的组差异,和依赖词汇大小的准确性。更广泛的影响:这项研究将奠定基础,构建一个语音识别工具的实际使用的计算机用户与神经运动障碍。 在这个项目中开发的工具和数据都将开源发布,并将被设计为可以很容易地移植到一个开源的视听语音识别系统,为构音障碍的用户。 这项工作还可能适用于目标社区以外的地方,因为项目成果可能与许多其他人群有关(例如,有外国口音的人)在训练当前的ASR系统方面有困难。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mark Hasegawa-Johnson其他文献

Mark Hasegawa-Johnson的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mark Hasegawa-Johnson', 18)}}的其他基金

FAI: A New Paradigm for the Evaluation and Training of Inclusive Automatic Speech Recognition
FAI:包容性自动语音识别评估和训练的新范式
  • 批准号:
    2147350
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Automatic Creation of New Speech Sound Inventories
RI:小型:协作研究:自动创建新语音库存
  • 批准号:
    1910319
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
EAGER: Matching Non-Native Transcribers to the Distinctive Features of the Language Transcribed
EAGER:将非母语转录者与转录语言的独特特征相匹配
  • 批准号:
    1550145
  • 财政年份:
    2015
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
FODAVA-Partner: Visualizing Audio for Anomaly Detection
FODAVA-合作伙伴:可视化音频以进行异常检测
  • 批准号:
    0807329
  • 财政年份:
    2008
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
RI Medium: Audio Diarization - Towards Comprehensive Description of Audio Events
RI Medium:音频二值化 - 全面描述音频事件
  • 批准号:
    0803219
  • 财政年份:
    2008
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Prosodic, Intonational, and Voice Quality Correlates of Disfluency
韵律、语调和语音质量与不流畅的相关性
  • 批准号:
    0414117
  • 财政年份:
    2004
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
CAREER: Landmark-Based Speech Recognition in Music and Speech Backgrounds
职业:音乐和语音背景中基于地标的语音识别
  • 批准号:
    0132900
  • 财政年份:
    2002
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant

相似海外基金

Analysis of Scientific Characteristics of Osaka's Specialty Vegetables with Distinctive Shapes by Cooking Method and Application to Food Education
大阪特色形状蔬菜的烹饪方法科学特征分析及其在食教中的应用
  • 批准号:
    23K12697
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Investigation of Various Distinctive Features Based on Extensive Field Work and Acoustic Analysis of Tohoku Japanese
基于广泛的现场工作和东北日本声学分析的各种显着特征的调查
  • 批准号:
    22K00516
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Mechanisms of the distinctive seasonality and variability of the activity of "South Coast Cyclone" along the Kuroshio
黑潮沿岸“南海岸气旋”活动的独特季节性和变化机制
  • 批准号:
    22K14097
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Distinctive role of ORAI3 channels in the effector T cell response
ORAI3 通道在效应 T 细胞反应中的独特作用
  • 批准号:
    10054359
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
A method for developing distinctive capabilities for successful servitization in manufacturing companies
一种开发制造公司成功服务化独特能力的方法
  • 批准号:
    20K12082
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Role of distinctive cortical motor maps for hand muscles for recovery post stroke
独特的皮质运动图对手部肌肉在中风后恢复中的作用
  • 批准号:
    10841118
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Development of abnormal growth detection system by measurement of distinctive bio-signal in disease chick embryos
通过测量患病鸡胚胎中的独特生物信号开发异常生长检测系统
  • 批准号:
    19K04433
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
RI: Small: Toward Human-Level Face Verification Performance Using Distinctive Features
RI:小:利用独特特征实现人类水平的人脸验证性能
  • 批准号:
    1909707
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Psychological, social & biological predictors of child mental health and development: shared and distinctive risk and protective factors in UK & India
心理、社会
  • 批准号:
    MR/S036466/1
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
    Research Grant
"Old" Children of Very Old Parents With Dementia: Distinctive Challenges and Support Needs
患有痴呆症的高龄父母的“年老”孩子:独特的挑战和支持需求
  • 批准号:
    9714516
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了