Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition

基于模型和示例的韵律特征提取及其与基于音素的识别的语音识别的有效集成

基本信息

项目摘要

The aim of this research is to exploit the prosodic information contained in speech for automatic speech recognition, where the prosodic information as well as phonemic information plays an important role for speech recognition.(a) Robust pitch determination algorithm : In contrast to the conventional pitch trackers based on numerical curve-fitting, the proposed method employs a quantitative pitch generation model, which is often used for synthesizing F_0 contour from prosodic event commands for estimating continuous F0 pattern. An inverse filtering technique is employed for obtaining the initial candidates of the prosodic commands. In order to find the optimal command sequence from the commands efficiently, a beam-search algorithm and an N-best technique are employed. Preliminary experiments for a male speaker of the ATR B-set database showed promising results both in quality of the restored pattern and estimation of the prosodic events.Along with the improvement of F_0 smoothing technique above, a novel approach of frame-wise pitch determination algorithm which gives reliability of pitch frequency, was proposed as well.(b) Prosodically guided speech recognition :i. As a first step toward speech recognition based on prosodic information, isolated word recognition task under noisy environment was employed. Experiments showed that word pitch pattern helps reducing the ambiguity in discriminating similar words.ii. It was shown that the dependencies between consecutive phrases can be measured by means of prosodic features, where 87 % accuracy rate was obtained for the ATR read speech data.iii. A prototype of prosodically guided speech recognition system was developed, where phrase hypotheses given by phoneme recognition are rescored on the basis of likelihood of phrase boundaries measured by prosodic features.
本研究的目的是利用语音中包含的韵律信息进行自动语音识别,其中韵律信息和音素信息在语音识别中起着重要的作用。(a)鲁棒的基音确定算法:与传统的基于数值曲线拟合的基音跟踪算法相比,该方法采用了一种定量的基音生成模型,该模型通常用于从韵律事件命令中合成F_0轮廓,以估计连续的F_0模式。采用逆滤波技术来获得韵律命令的初始候选。为了有效地从命令中找到最优命令序列,采用波束搜索算法和N-best技术。对ATR B集数据库中的男性说话人进行了初步实验,实验结果表明,该方法在恢复模式质量和韵律事件估计方面都取得了较好的效果。沿着上述F_0平滑技术的改进,本文还提出了一种新的逐帧基音周期确定算法,该算法能够可靠地确定基音频率。(b)韵律引导的语音识别:i.作为基于韵律信息的语音识别的第一步,在噪声环境下的孤立词识别任务。实验表明,词的音高模式有助于减少歧义在区分相似的单词。结果表明,连续短语之间的依赖性可以通过韵律特征来测量,其中87%的准确率获得了ATR读取语音数据。韵律引导的语音识别系统的原型开发,其中由音素识别给出的短语假设的韵律特征测量的短语边界的可能性的基础上重新评分。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
中井満: "Fo信頼場を用いたFo制御機構の指令推定" 日本音響学会平成10年度春季研究発表会. (1998)
Mitsuru Nakai:“使用 Fo 可靠场的 Fo 控制机制的命令估计”日本声学学会 1998 年春季研究报告(1998 年)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Yoshinori Sagisaka: "Computing Prosody" Springer, 401 (1997)
Yoshinori Sagisaka:“计算韵律” Springer,401 (1997)
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Hiroshi Shimnodaira: "Modified Minimum Classification Error Learning and Its Application to Neural Net-works" Advances in Pattern Recognition. 785-794 (1998)
Hiroshi Shimnodaira:“改进的最小分类误差学习及其在神经网络中的应用”模式识别的进展。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.Euro Speech ‘97. 243-246 (1997)
Mitsuru Nakai:“关于使用可靠性函数进行韵律分析的语音基本频率的表示”Proc.Euro Speech ‘97 (1997)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.EuroSpeech'97. 243-246 (1997)
Mitsuru Nakai:“关于使用可靠性函数进行韵律分析的语音基本频率的表示”Proc.EuroSpeech97。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

SHIMODAIRA Hiroshi其他文献

SHIMODAIRA Hiroshi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('SHIMODAIRA Hiroshi', 18)}}的其他基金

Asynchronous-Transition Hidden Markov Model with State-Tying across Time for Automatic Speech Recognition
用于自动语音识别的具有跨时间状态绑定的异步转移隐马尔可夫模型
  • 批准号:
    12680375
  • 财政年份:
    2000
  • 资助金额:
    $ 1.66万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了