权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Study on Constructing Various Acoustic Models using Distributed Speech Corpora

利用分布式语音语料库构建多种声学模型的研究

基本信息

批准号：
15200014
负责人：
TAKEDA Kazuya
金额：
$ 29.37万
依托单位：
Nagoya University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (A)
财政年份：
2003
资助国家：
日本
起止时间：
2003 至 2005
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-15200014/
关键词：
speech recognition acoustic model speech corpus distributed database distributed training sufficient statistics speaker adaptation 分散処理 HMM モデル補間連続音声認識

项目摘要

In order to collect speech utterances made under various environmental conditions, field tests of spoken dialogue systems have been conducted for the public transportation guidance, the in-car information retrieval and the guidance for a public space. Based on the three corpora, a prototype of the data sharing infrastructure for acoustic model training has been developed. In the system, one can search for the particular speech subsets by invoking queries on the age of the speakers, SNR of the utterance and distribution of the phoneme frequency. The system can train a set of HMM's by sharing the efficient statistics, i.e., the visiting count, the branching count, the sum and the square sum, for the Gaussian Mixture pdf's for each state of HMM acoustic models. In addition, in order to characterize the utterance, a blind, i.e., does not require the explicit voice activity detection (VAD), method for SNR is developed for wide range of the SNR.As for the training strategy, not only the maximum likelihood (ML) training over the set of utterances, but also a model adaptation method using only statistics has been also studied. The effectiveness of the adaptation approach using pre-stored statistics for each utterance was confirmed through the recognition experiments where the accuracy of the model trained by the adaptation is almost equivalent to the pooled EM algorithm.

为了收集在各种环境条件下发出的语音，已经针对公共交通引导、车内信息检索和公共空间引导进行了语音对话系统的现场测试。基于这三个语料库，开发了一个用于声学模型训练的数据共享基础设施原型。在该系统中，人们可以通过调用对说话人的年龄、话语的SNR和音素频率分布的查询来搜索特定的语音子集。系统可以通过共享有效的统计数据来训练一组HMM，即，访问计数、分支计数、和以及平方和，用于HMM声学模型的每个状态的高斯混合pdf。此外，为了表征话语，盲的，即，在训练策略上，本文研究了基于最大似然（ML）的训练方法和基于统计量的模型自适应方法。使用预存储的统计数据为每个话语的自适应方法的有效性得到了确认，通过识别实验，其中由自适应训练的模型的准确性几乎相当于池EM算法。