权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

提出基于结构不变性定理的语音可供性并构建基于其的语音识别系统

基本信息

批准号：
19024023
负责人：
峯松信明
金额：
$ 4.8万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
2007
资助国家：
日本
起止时间：
2007 至 2008
项目状态：
已结题

项目摘要

本研究では, 線形・非線形を問わず, あらゆる可逆な変換・写像に対して不変な特徴量であるバタチャリヤ距離を用いた音声認識系について研究を行なった。主な成果は4つある。一つは1)不変量の一般式を導出したこと。即ち, 不変量はf-divergenceでなければならないことを数学的に証明したことである。二つ目は2)話者性による音声の違いを変換・写像として捉えた場合の, その写像関数の推定方法として現在広く使われているGMM法の欠点を明確にし, それを解決する新しい写像推定法を提案したこと, 3)f-divergenceに基づく表象は, 一般に強すぎる不変性を持つ。これは, 対象とする変換群にのみ不変性を示す表象技術を構築する必要があることを意味するが, 部分空間への分割, 及び部分空間での構造化を通してこの問題を解決したこと, 4)更には, 実用アプリケーションとして, 外国語発音評価システムを構築したことである。以下, 各々についてより詳細に示す。バタチャリヤ距離が任意の可逆かつ連続的な変換に対しても不変であることを既に証明されていたが, 本研究では, バタチャリヤ距離の一般形である, f-divergenceも不変性を満たし, また, 不変な尺度はf-divergenceでなければならないという必要性までも証明することに成功した。f-divergenceはバタチャリヤ距離, カルバックライブラ距離など, 様々な分布間距離の一般形として位置づけられており, より本質的な意味に置いて, 不変表象の数学的基盤を構築することができた。f-divergenceは変換不変であるが, 話者の変化はどのような変換関数としてモデル化されるのか? 従来この問題はGMMによる変換関数推定が広く行なわれているが, 本研究では, この従来法の欠点を明確にし, より正しい最適化手法を用いて変換関数推定を行なう手法を提案した。実験的にも提案手法を用いることで, 推定誤差を有意に削減できることを確認した。その一方で, f-divergenceに基づく音声表象は, 不変性が極めて強く, 例えば, 異なる単語が等しいと判定されることが起こりえる。これは, 話者の違いも音韻の違いも同一の物理量を変形することが原因であり, 一種のトレードオフとなる。結局望まれるのは, 話者性だけに不変な制約付きの不変性である。本研究では, 話者性の変換がどのような変換群を構成するのかに着眼し, 限られた変換群のみに対して不変性が成立する手法を提案し, 実験的にその有効性を検証した。また, f-divergenceは事象と事象の差分(間隔)を測る尺度であるため, 事象がN個存在する場合は, N(N-1)/2個の測定量が得られ, パラメータ次元数が容易に増加する。これを削減するために, LDAやPCAの効果的導入をはかり, eigen structureと呼ばれる特徴量表現を提案するに至った。更に, 実用アプリケーションとして, 外国語発音の評価システムを構築した。数年後には全ての公立小学校で英語教育が開始される。ここでは話す/聞く教育がメインとなるが, 例えば発音を指導できる教師は非常に限られている。このような情勢を考慮し, 子どもの声であっても頑健に処理できる音声の構造的表象を用いたCALL(Computer Aided Language Learning)システムの構築を行なった。600名以上の学習者の音声を評価し, 発音カルテと呼ばれる診断書の配布などを行なった。

In this study, linear and non-linear problems were investigated, and reversible transformation and image writing were investigated. The main result is 4. 1) The general expression of the quantity is derived from the quantity. That is to say, the quantity of f-divergence is not equal to that of mathematics. 2) The speaker's character is different from that of the sound, and the writing image is different from that of the sound. In the case of writing image, the estimation method of the number of writing image is different from that of the current GMM method. The problem of image transformation and visualization technology is solved by dividing part of the space and constructing part of the space. The following is a detailed description of each item. This paper proves the necessity of proving the general form of f-divergence invariance of the distance between two different scales. F-divergence is the general form of distance between distributions, the position of the distance between distributions, the essential meaning of the distance between distributions, and the construction of a mathematical base without representation. f-diversity is not changed, the speaker's change is changed, the number of changes is changed, and the number of changes is changed. In this study, we propose a new method to solve the problem of GMM, which is based on the optimization method. The proposed method is to reduce the error. A party, f-divergence, sound representation, non-variation, extreme intensity, example, difference, language, etc. The reason for the change of the same physical quantity is that the speaker has violated the sound and the same physical quantity has violated the sound. The result is that the speaker's nature does not change. This study focuses on how to construct a group of conversation-oriented groups, and proposes a method to verify the effectiveness of conversation-oriented groups. When there are N events, N(N-1)/2 measurements are obtained, and the number of events is easily increased. The introduction of LDA and PCA results was discussed in detail. In addition, the construction of the evaluation system of foreign language sounds is carried out by means of the application of the software. A few years later, English education began in all public primary schools. For example, if you want to talk about education, you should talk about guidance. This is the first time that we've had a chance to learn how to use CALL(Computer Aided Language Learning). More than 600 learners were evaluated for their voice, voice, and diagnosis.

项目成果

期刊论文数量（33）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

音声の構造的表象に基づく英語学習者発音の音響分析

基于声音结构表征的英语学习者发音声学分析

DOI：
发表时间：
2007
期刊：
電子情報通信学会論文誌 J90-D-5
影响因子：
0
作者：
朝川智;峯松信明;広瀬啓吉
通讯作者：
広瀬啓吉

Pronunciation clinic -which part of your pronunciation to correct at first to become like your model speaker?-

发音诊所 - 为了变得像你的模范发音者，首先要纠正哪一部分发音？-

DOI：
发表时间：
2008
期刊：
影响因子：
0
作者：
N. Minematsu;K. Kamata;M. Takazawa;K. Takeuchi;S. Asakawa;T. Makino;Y. Yamauchi;T. Nishimura,' K. Hirose
通讯作者：
T. Nishimura,' K. Hirose

Training of pronunciation as learning of the sound system embedded in the target language

发音训练即学习目标语言中嵌入的声音系统

DOI：
发表时间：
2008
期刊：
Proc.Int.Symposium on Phonetic Frontiers (CD-ROM)
影响因子：
0
作者：
R. Kawai;A. Kashihara;大高泉;N.Minematsu
通讯作者：
N.Minematsu

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

DOI：
10.1109/icassp.2008.4518528
发表时间：
2008-05
期刊：
2008 IEEE International Conference on Acoustics, Speech and Signal Processing
影响因子：
0
作者：
Y. Qiao;Naoya Shimomura;N. Minematsu
通讯作者：
Y. Qiao;Naoya Shimomura;N. Minematsu

Structural assessment of language learners' pronunciation

语言学习者发音的结构评估