权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Study on Multimedia Teaching Material by Hyper-Media and Content Analysis of Lecture Videos

超媒体多媒体教材研究及讲座视频内容分析

基本信息

批准号：
11480081
负责人：
YASUO Ariki
金额：
$ 8.32万
依托单位：
Ryukoku University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
1999
资助国家：
日本
起止时间：
1999 至 2001
项目状态：
已结题

项目摘要

Main results of this study from 1999 to 2001 are summarized into seven points as follows;1. Indexing to spoken documents based on high speed and high accurate speech recognition: Speech decoding based on phonemes or words error minimization was proposed as well as iterative adaptation of acoustic models in unsupervised mode.2. Indexing to noisy speech based on noise and BGM robust speech recognition: Speech recognition based on non-stationary as well as stationary noise reduction was proposed by using Kalman filter and MLLR.3. Speaker indexing based on speaker recognition: Speaker recognition based on phoneme and speaker separation was proposed and individual person was indexed based on the proposed method.4. Indexing to video image based on character and image recognition: Video caption or flip recognition was proposed by carrying out the video caption frame selection, effective binarization and OCR.5. Topic segmentation based on speech and character recognition: News videos were segmented into individual topic based on word space method proposed in this study. In commercial video, the topic segmentation was proposed by integrating video caption recognition and speech recognition.6. Structuring and content description to video image: Table of contents of video images was produced after indexing and topic segmentation.7. Topic retrieval and summarization for hyperlink construction: Cross media retrieval was proposed as well as the video clip extraction where the specific person was speaking about the specific topics.

本研究从1999年至2001年的主要研究结果归纳为以下七点：1。基于高速高精度语音识别的口语文档索引：提出了基于音素或单词错误最小化的语音解码以及无监督模式下声学模型的迭代自适应.基于噪声和BGM的带噪语音索引鲁棒语音识别：利用卡尔曼滤波器和MLLR，提出了基于非平稳和平稳噪声抑制的语音识别方法.基于说话人识别的说话人索引：提出了基于音素和说话人分离的说话人识别方法，并基于该方法对单个人进行索引.基于字符和图像识别的视频图像索引：通过对视频字幕帧的选择、有效的二值化和OCR，提出了视频字幕或翻转的识别方法.基于语音和字符识别的主题分割：基于本文提出的词空间方法将新闻视频分割成单个主题。在商业视频中，提出了结合视频字幕识别和语音识别的主题分割方法.对视频图像进行结构化和内容描述：对视频图像进行索引和主题分割，生成视频图像的目录.超链接构建的主题检索和摘要：提出了跨媒体检索以及特定人谈论特定主题的视频片段提取。