权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Person Recognition by Multi-modal Information

多模态信息的人物识别

基本信息

批准号：
09680394
负责人：
KITAMURA Tadashi
金额：
$ 2.11万
依托单位：
Nagoya Institute of Technology (NIT)
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1997
资助国家：
日本
起止时间：
1997 至 1998
项目状态：
已结题

项目摘要

1. We proposed a new technique for person recognition using bimodal information comprising of speech and facial image. The proposed method utilizes a Hidden Markov Model(HMM) for a image sequence of lip movement of a spoken word. We studied intensity and location normalization algorithms and obtained a recognition accuracy of about 95% for a bimodal database Tulips1(12 persons, 4 digit word in English). We also proposed a new normalization algorithm and showed that it reduces the calculation amount less than the one we proposed before.2. We also applied the proposed method to a bimodal database M2VTS bigger than Tulips1, which consists of 10 digit words of 37 persons. Furthermore, some algorithms based on HMM for normalization of facial image and tracking of lip location were studied. We carried out spoken word recognition and speaker identification experiments using only lip reading information. The experimental results have shown that an use of intensity and location normalization is very effective. We obtained a speaker identification rate of 81.0% using one word "0" and a word recognition rate of 74.2% for 10 digits for 37 persons, respectively.3. For speaker identification using speech, we proposed a new spectral parameter estimation method which utilizes a phase characteristics of a second-order all-pass warping function. This method can change the frequency resolution of speech spectrum in an arbitrary region. Using the proposed method we carried out speaker recognition experiments based on a discriminative feature extraction (DFE), which optimizes the warping function of spectrum for speaker recognition. We carried out speaker identification experiments by the proposed method and conventional ones. Experimental results have shown that this method is more effective than conventional methods and spectrum around 2kHz is very important for speaker identification.

1.提出了一种利用语音和人脸图像的双模态信息进行人脸识别的新方法。所提出的方法利用隐马尔可夫模型（HMM）的图像序列的嘴唇运动的口语单词。我们研究了强度和位置归一化算法，并获得了约95%的识别准确率的双峰数据库郁金香1（12人，4位数的英文单词）。我们还提出了一种新的归一化算法，并表明它减少了计算量小于我们以前提出的.我们还将所提出的方法应用到一个双峰数据库M2VTS大于郁金香1，其中包括10位字的37人。研究了基于隐马尔可夫模型的人脸图像归一化和嘴唇位置跟踪算法。我们进行了口语单词识别和说话人识别实验，只使用唇阅读信息。实验结果表明，使用强度和位置归一化是非常有效的。对37人的说话人识别实验中，使用一个单词“0”的识别率为81.0%，使用10个数字的识别率为74.2%。针对说话人识别问题，提出了一种利用二阶全通弯曲函数相位特性的谱参数估计方法。该方法可以改变语音频谱在任意区域的频率分辨率。使用所提出的方法，我们进行了说话人识别实验的基础上的区别性特征提取（DFE），它优化了弯曲函数的频谱说话人识别。我们进行了说话人识别实验，提出的方法和传统的。实验结果表明，该方法比传统的方法更有效，2kHz附近的频谱对说话人识别非常重要。