权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Spontaneous speech recognition

自发语音识别

基本信息

批准号：
15500098
负责人：
KOHDA Masaki
金额：
$ 2.05万
依托单位：
Yamagata University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2003
资助国家：
日本
起止时间：
2003 至 2005
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-15500098/
关键词：
Corpus of Spontaneous Japanese Spontaneous speech recognition Robust speech recognition Acoustic model Language model Unsupervised adaptation Continuous-mixture HMM Discrete-mixture HMM 音声認識発音変形依存モデル MLLR 品詞N-gram

项目摘要

We investigated spontaneous speech recognition on academic lecture task and obtained the following results.(1) Lecture speech recognition using pronunciation variant modeling and unsupervised adaptationWe focus on the pronunciation variations observed in spontaneous speech. Aiming to introduce the context-dependence of pronunciation variants, we propose a new method of language modeling based on morphological analysis data designed for pronunciation variant. The proposed method was evaluated on the Corpus of Spontaneous Japanese (CSJ) and achieved the decrease in word error rate (WER) by 4.74% absolute. In addition, unsupervised adaptation of both acoustic and language models was introduced to improve the recognition performance further. The results showed the decrease in WER from 19.96% without adaptation to 15.41% with unsupervised adaptation.(2) Lecture speech recognition using discrete-mixture HMMsWe have investigated noisy speech recognition by using discrete-mixture HMM (DMHMM), … More and found that the performance of DMHMM overcame that of continuous-mixture HMM under environmental noise conditions or impulsive noise conditions. However, it is not clear whether this method is effective in clean conditions. The aim of this investigation is to evaluate the performance of the DMHMM system in clean conditions. In evaluation, we decided to use the "Corpus of Spontaneous Japanese" (CSJ) because we want to compare the performance of our system with that of other recognition systems with common speech corpus, and clarify the performance in such a more difficult task. In the recognition experiments, 3000-state DMHMMs (16 mixture components per state) were used as acoustic models. The language model which represents the pronunciation variety was trained by using 6.86 million words from 2668 lectures in CSJ and was used for recognition. As a result, the system obtained 20.30% WER for 10 academic lectures uttered by male speakers and demonstrated the effectiveness of the proposed method. Less

我们研究了学术演讲任务中的自发语音识别，得到了以下结果。(1)使用发音变体建模和无监督自适应的演讲语音识别我们专注于自发语音中观察到的发音变体。针对发音变体的上下文相关性，提出了一种基于发音变体形态分析数据的语言建模方法。在自然日语语料库（CSJ）上对该方法进行了评估，并实现了4.74%的绝对字错误率（WER）的下降。此外，引入了声学和语言模型的无监督自适应，以进一步提高识别性能。结果表明，WER从没有适应的19.96%下降到15.41%与无监督的适应。(2)基于离散混合隐马尔可夫模型的演讲语音识别我们研究了基于离散混合隐马尔可夫模型（DMHMM）的含噪语音识别， ...更多信息发现在环境噪声和脉冲噪声条件下，DMHMM的性能优于连续混合HMM。但是，目前尚不清楚这种方法在清洁条件下是否有效。本研究的目的是评估DMHMM系统在清洁条件下的性能。在评估中，我们决定使用“自发日语语料库”（CSJ），因为我们想比较我们的系统的性能与其他识别系统的普通语音语料库，并澄清在这样一个更困难的任务的性能。在识别实验中，使用3000状态DMHNMR（每个状态16个混合成分）作为声学模型。利用CSJ中2668个讲座的686万个单词训练了代表发音变化的语言模型，并用于识别。结果表明，该系统获得了20.30%的WER为男性演讲者发表的10个学术讲座，并证明了所提出的方法的有效性。少