权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition

基于模型和示例的韵律特征提取及其与基于音素的识别的语音识别的有效集成

基本信息

批准号：
08680391
负责人：
SHIMODAIRA Hiroshi
金额：
$ 1.66万
依托单位：
Japan Advanced Institute of Science and Technology, Hokuriku
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1996
资助国家：
日本
起止时间：
1996 至 1998
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-08680391/
关键词：
prosody prosodic-boundary pitch pattern speech recognition 藤崎モデル

项目摘要

The aim of this research is to exploit the prosodic information contained in speech for automatic speech recognition, where the prosodic information as well as phonemic information plays an important role for speech recognition.(a) Robust pitch determination algorithm : In contrast to the conventional pitch trackers based on numerical curve-fitting, the proposed method employs a quantitative pitch generation model, which is often used for synthesizing F_0 contour from prosodic event commands for estimating continuous F0 pattern. An inverse filtering technique is employed for obtaining the initial candidates of the prosodic commands. In order to find the optimal command sequence from the commands efficiently, a beam-search algorithm and an N-best technique are employed. Preliminary experiments for a male speaker of the ATR B-set database showed promising results both in quality of the restored pattern and estimation of the prosodic events.Along with the improvement of F_0 smoothing technique above, a novel approach of frame-wise pitch determination algorithm which gives reliability of pitch frequency, was proposed as well.(b) Prosodically guided speech recognition :i. As a first step toward speech recognition based on prosodic information, isolated word recognition task under noisy environment was employed. Experiments showed that word pitch pattern helps reducing the ambiguity in discriminating similar words.ii. It was shown that the dependencies between consecutive phrases can be measured by means of prosodic features, where 87 % accuracy rate was obtained for the ATR read speech data.iii. A prototype of prosodically guided speech recognition system was developed, where phrase hypotheses given by phoneme recognition are rescored on the basis of likelihood of phrase boundaries measured by prosodic features.

本研究的目的是利用语音中包含的韵律信息进行自动语音识别，其中韵律信息和音素信息在语音识别中起着重要的作用。(a)鲁棒的基音确定算法：与传统的基于数值曲线拟合的基音跟踪算法相比，该方法采用了一种定量的基音生成模型，该模型通常用于从韵律事件命令中合成F_0轮廓，以估计连续的F_0模式。采用逆滤波技术来获得韵律命令的初始候选。为了有效地从命令中找到最优命令序列，采用波束搜索算法和N-best技术。对ATR B集数据库中的男性说话人进行了初步实验，实验结果表明，该方法在恢复模式质量和韵律事件估计方面都取得了较好的效果。沿着上述F_0平滑技术的改进，本文还提出了一种新的逐帧基音周期确定算法，该算法能够可靠地确定基音频率。(b)韵律引导的语音识别：i.作为基于韵律信息的语音识别的第一步，在噪声环境下的孤立词识别任务。实验表明，词的音高模式有助于减少歧义在区分相似的单词。结果表明，连续短语之间的依赖性可以通过韵律特征来测量，其中87%的准确率获得了ATR读取语音数据。韵律引导的语音识别系统的原型开发，其中由音素识别给出的短语假设的韵律特征测量的短语边界的可能性的基础上重新评分。