Person Recognition by Multi-modal Information
多模态信息的人物识别
基本信息
- 批准号:09680394
- 负责人:
- 金额:$ 2.11万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (C)
- 财政年份:1997
- 资助国家:日本
- 起止时间:1997 至 1998
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
1. We proposed a new technique for person recognition using bimodal information comprising of speech and facial image. The proposed method utilizes a Hidden Markov Model(HMM) for a image sequence of lip movement of a spoken word. We studied intensity and location normalization algorithms and obtained a recognition accuracy of about 95% for a bimodal database Tulips1(12 persons, 4 digit word in English). We also proposed a new normalization algorithm and showed that it reduces the calculation amount less than the one we proposed before.2. We also applied the proposed method to a bimodal database M2VTS bigger than Tulips1, which consists of 10 digit words of 37 persons. Furthermore, some algorithms based on HMM for normalization of facial image and tracking of lip location were studied. We carried out spoken word recognition and speaker identification experiments using only lip reading information. The experimental results have shown that an use of intensity and location normalization is very effective. We obtained a speaker identification rate of 81.0% using one word "0" and a word recognition rate of 74.2% for 10 digits for 37 persons, respectively.3. For speaker identification using speech, we proposed a new spectral parameter estimation method which utilizes a phase characteristics of a second-order all-pass warping function. This method can change the frequency resolution of speech spectrum in an arbitrary region. Using the proposed method we carried out speaker recognition experiments based on a discriminative feature extraction (DFE), which optimizes the warping function of spectrum for speaker recognition. We carried out speaker identification experiments by the proposed method and conventional ones. Experimental results have shown that this method is more effective than conventional methods and spectrum around 2kHz is very important for speaker identification.
1.提出了一种利用语音和人脸图像的双模态信息进行人脸识别的新方法。所提出的方法利用隐马尔可夫模型(HMM)的图像序列的嘴唇运动的口语单词。我们研究了强度和位置归一化算法,并获得了约95%的识别准确率的双峰数据库郁金香1(12人,4位数的英文单词)。我们还提出了一种新的归一化算法,并表明它减少了计算量小于我们以前提出的.我们还将所提出的方法应用到一个双峰数据库M2VTS大于郁金香1,其中包括10位字的37人。研究了基于隐马尔可夫模型的人脸图像归一化和嘴唇位置跟踪算法。我们进行了口语单词识别和说话人识别实验,只使用唇阅读信息。实验结果表明,使用强度和位置归一化是非常有效的。对37人的说话人识别实验中,使用一个单词“0”的识别率为81.0%,使用10个数字的识别率为74.2%。针对说话人识别问题,提出了一种利用二阶全通弯曲函数相位特性的谱参数估计方法。该方法可以改变语音频谱在任意区域的频率分辨率。使用所提出的方法,我们进行了说话人识别实验的基础上的区别性特征提取(DFE),它优化了弯曲函数的频谱说话人识别。我们进行了说话人识别实验,提出的方法和传统的。实验结果表明,该方法比传统的方法更有效,2kHz附近的频谱对说话人识别非常重要。
项目成果
期刊论文数量(76)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Jun Hiroi etc.: ""Lip Image Sequence Generation Using HMM"" Proc.of 1999 Spring Meeting of ASJ.2-P-22. 311-312 (1999)
Jun Hiroi 等:“使用 HMM 生成唇部图像序列”Proc.of 1999 Spring Meeting of ASJ.2-P-22。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Tanaka,Vanegas,Tokuda,Kitamura: "Intensity/Location Normalization for Automatic Lipreading" International Conference of Signal Processing. ICSP98. 920-923 (1998)
Tanaka,Vanegas,Tokuda,Kitamura:“自动唇读的强度/位置标准化”信号处理国际会议。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
宮島,杉浦,徳田,北村: "Discrete or Tied-Mixture HMMBased on Self-Organizing Feature Map for Robust Probability Estimation" 音声信号処理に関する国際学会 Proc.of ICSP97. ICSP97. 529・532 (1997)
Miyajima、Sugiura、Tokuda、Kitamura:“基于自组织特征图的稳健概率估计的离散或捆绑混合 HMM”国际音频信号处理会议 ICSP97(1997)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Oscar Vanegas etc.: ""HMM-Based Visual Speech Recognition Using Intensity and Location Normalization"" Proceedings of International Conference on Spoken Language Processing (ICSLP98). 289-292 (1998)
Oscar Vanegas等:“使用强度和位置标准化的基于HMM的视觉语音识别”国际口语处理会议论文集(ICSLP98)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Takayoshi Yoshimura etc.: ""State Duration Modeling for HMM-Based Synthesis"" IEICE Tech.Report. SP98-64. 45-50 (1998)
Takayoshi Yoshimura 等:““基于 HMM 的合成的状态持续时间建模””IEICE Tech.Report。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
KITAMURA Tadashi其他文献
KITAMURA Tadashi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('KITAMURA Tadashi', 18)}}的其他基金
Subunit modeling for Japanese sign language recognition based on stochastic model
基于随机模型的日语手语识别子单元建模
- 批准号:
22500506 - 财政年份:2010
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
固有声(eigenvoice)に基づいた音声合成---多様な声質の実現を目指して---
基于特征语音的语音合成---旨在实现多样化的音质---
- 批准号:
12680380 - 财政年份:2000
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Studies on the folklore concerning cattle raising in Chugoku mountain area
中国山区养牛民间传说研究
- 批准号:
09610316 - 财政年份:1997
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Formation and Diffusion of the Folk Cultures in Chugoku Mountains
中国山地民俗文化的形成与传播
- 批准号:
05610250 - 财政年份:1993
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)
Word Recognition using A Two-Dimensional Mel-cepstrum under Noisy Environments.
嘈杂环境下使用二维梅尔倒谱的单词识别。
- 批准号:
63550253 - 财政年份:1988
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)
On the Value Orientations of Okinawa in the Light of Dynamics of "Monchu" System.
从“文丘”制度动力看冲绳的价值取向
- 批准号:
60510151 - 财政年份:1985
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)
相似海外基金
An investigation of generative acoustic latent representations for meeting speech recognition and summarization
用于满足语音识别和摘要的生成声学潜在表示的研究
- 批准号:
24K15004 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Disrupter or enabler? Assessing the impact of using automatic speech recognition technology in interpreter-mediated legal proceedings
颠覆者还是推动者?
- 批准号:
2889440 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Studentship
Analysis of speech recognition as a tool in medical English education
语音识别作为医学英语教育工具的分析
- 批准号:
23K00767 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Automatic Speech Recognition (ASR) engine to improve autistic children speech
自动语音识别(ASR)引擎可改善自闭症儿童的言语能力
- 批准号:
10056712 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Grant for R&D
Industrial research into the reduction of biases in foundational Automatic Speech Recognition models.
减少基础自动语音识别模型中偏差的工业研究。
- 批准号:
10068091 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Collaborative R&D
M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition
M3OLR:迈向有效的多语言、多模态和多任务东方稀缺语言语音识别
- 批准号:
23K11227 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of intraoperative education model using speech recognition and language information processing technology
利用语音识别和语言信息处理技术建立术中教育模型
- 批准号:
23K16281 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
SaTC: CORE: Small: Robust Speaker and Speech Recognition Under AI-Driven Physical and Digital Attacks
SaTC:核心:小型:人工智能驱动的物理和数字攻击下的鲁棒扬声器和语音识别
- 批准号:
2310207 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
Continuing Grant
A State-of-the-Art Automatic Speech Recognition and Conversational Platform to Enable Socially Assistive Robots for Persons with Alzheimer's Disease and Related Dementias
最先进的自动语音识别和对话平台,为阿尔茨海默病和相关痴呆症患者提供社交辅助机器人
- 批准号:
10699887 - 财政年份:2023
- 资助金额:
$ 2.11万 - 项目类别:
CRCNS US-Spain Research Proposal: Collaborative Research: Tracking and modeling the neurobiology of multilingual speech recognition
CRCNS 美国-西班牙研究提案:合作研究:跟踪和建模多语言语音识别的神经生物学
- 批准号:
2207770 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Continuing Grant