权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A scheme for continuous speech recognition in a large context based on the human process of spoken language recognition

基于人类口语识别过程的大上下文连续语音识别方案

基本信息

批准号：
03452164
负责人：
FUJISAKI Hiroya
金额：
$ 4.48万
依托单位：
Science University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for General Scientific Research (B)
财政年份：
1991
资助国家：
日本
起止时间：
1991 至 1992
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-03452164/
关键词：
Spoken Language Human Processes of Recognition Large Context Continuous Speech Speech Recognition System Syntactic Information Semantic Information Discourse Information 認識過程人間内部辞書辞書検索

项目摘要

Most of the current systems for automatic speech recognition fail to achieve recognition performance comparable to human listeners, since they are constructed without paying attention to the human processes of spoken language recognition. From this point of view, the present study investigates the human processes and incorporates the findings into a scheme for automatic recognition of continuous speech in a large context. The followings are the main results:1. Experimental investigation and modeling of the human processes of spoken language recognitionUsing as stimuli natural utterances with controlled acoustic, syntactic and semantic information, the following findings were obtained on the human processes of spoken language recognition.(1) The unit of speech recognition varies widely from phones and syllables to words and phrases depending on the experimental condition and context.(2) Larger units generally require less accuracy of representation for correct recognition.(3) The amount … More of acoustic information necessary for recognition of a given unit varies widely depending on the size of context and prior knowledge on the part of the listener.(4) The accuracy and speed of access to mental lexicon varies dynamically depending on the acoustic, syntactic, semantic and discourse information available to the listener.Based on these findings, a model has been constructed for the human processes of spoken language recognition.2. Proposal and implementation of a scheme for automatic recognition of spoken language recognitionBased upon the above findings and the model, a scheme for automatic recognition of continuous speech in a large context has been proposed, featuring (1) use of multiple size units and accuracy of acoustic feature representation, (2) use of prosodic features for word and phrase boundary detection, (3) extraction of syntactic, sematic, and idiosyncratic information from a large context. The main components of the system have been implemented.3. Demonstration of the validity of the proposed schemeThe proposed scheme has been tested by recognition experiments of phones, syllables and words in continuous speech with a large context, and the results have confirmed the essential validity and feasibility of the proposed scheme. Less

目前大多数自动语音识别系统都无法达到与人类听众相当的识别性能，因为它们的构建没有关注人类语音识别的过程。从这个角度来看，本研究调查了人类的过程，并将研究结果纳入了一个在大语境中自动识别连续语音的方案。主要研究结果如下：人类口语识别过程的实验研究与建模利用具有受控声学、句法和语义信息的自然话语作为刺激，对人类口语识别过程进行了如下研究。(1)根据实验条件和上下文，语音识别的单元从电话和音节到单词和短语变化很大。(2)较大的单位通常对正确识别的表示精度要求较低。识别一个给定单位所需的声学信息的数量取决于上下文的大小和听者的先验知识。(4)听者获取心理词汇的准确性和速度会随着听觉、句法、语义和话语信息的不同而发生动态变化。基于这些发现，我们构建了一个人类口语识别过程的模型。基于上述发现和模型，本文提出了一种大语境下连续语音自动识别方案，其特点是：(1)使用多个大小单位和声学特征表示的准确性，(2)使用韵律特征进行单词和短语边界检测，(3)从大语境中提取句法、语义和特质信息。系统的主要组成部分已经实现。通过对大语境下连续语音中的语音、音节和单词的识别实验，验证了所提方案的有效性和可行性。少