权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of a supporting system for creation of educational video contents using robust automatic speech recognition technology

使用强大的自动语音识别技术开发教育视频内容创建支持系统

基本信息

批准号：
14580246
负责人：
KANEDERA Noboru
金额：
$ 1.34万
依托单位：
Ishikawa National College of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2002
资助国家：
日本
起止时间：
2002 至 2004
项目状态：
已结题

项目摘要

We developed a supporting system for creation of educational video contents. The system automatically segments a lecture video material into subtopics based on speech signals. To represent subtopics of video scenes, the text recognized by automatic speech recognition (ASR) from a lecture speech was converted into an index using independent component analysis (ICA) instead of conventional TF-IDF. This research attempted a method of segmentation using dynamic programming that minimizes the sum of cosine distances between adjacent indexes that represent subtopics of video scenes. The validity of the proposed method was evaluated using sample lecture videos uttered by five lecturers. Results indicated that scene segmentation using automatic speech recognition performed as well as that using transcription text.Editing a video requires searching for subtopic segmentation positions, and extraction of necessary video segments, or removing unnecessary video segments. In particular, when searching subtopic segmentation positions, a large amount of time and efforts are required to review the video from beginning to end. That is, it is hard work to search subtopic segmentation positions. It is therefore expected to reduce the editing time and efforts by the developed system with automatic subtopic segmentation. In this research, we carried out subjective evaluation by 16 examinees and 5 lecture video materials to confirm the effect of automatic subtopic segmentation. As a result, 75% of examinees answered that the editing method with automatic subtopic segmentation is better than that without segmentation. Moreover, the average editing time was reduced by about 14%.

我们开发了一个教育视频内容创作的支撑系统。该系统根据语音信号自动将讲座视频材料分割成子主题。为了表示视频场景的子主题，使用独立分量分析(ICA)代替传统的TF-IDF将自动语音识别(ASR)从演讲演讲中识别出的文本转换为索引。这项研究尝试了一种使用动态规划的分割方法，该方法最小化表示视频场景的子主题的相邻索引之间的余弦距离之和。使用五位讲师的讲课视频样本对该方法的有效性进行了评估。结果表明，使用自动语音识别的场景分割效果与使用转录文本的场景分割效果相当。编辑视频需要搜索子主题分割位置，并提取必要的视频片段，或删除不必要的视频片段。特别是在搜索副主题切分位置时，从头到尾都需要花费大量的时间和精力来回顾视频。也就是说，搜索子主题切分位置是一项艰苦的工作。因此，预计将减少所开发的具有自动分块的系统的编辑时间和工作量。在本研究中，我们对16名考生和5个讲座视频素材进行了主观评价，以证实自动分主题的效果。结果，75%的考生回答自动分词的编辑方法比没有分词的编辑方法要好。此外，平均编辑时间减少了约14%。

项目成果

期刊论文数量（72）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Subtopic segmentation in the lecture speech.

讲座演讲中的副主题分割。

DOI：
发表时间：
2004
期刊：
Proceedings of International Conference on Spoken Language Processing Vol.III
影响因子：
0
作者：
N.Kanedera;A.Sumida;T.Ikehata;T.Funada
通讯作者：
T.Funada

音声による講義ビデオシーン分割方法の検討

音频讲座视频场景分割方法的思考

DOI：
发表时间：
2003
期刊：
日本音響学会2003年春季研究発表会講演論文集 I
影响因子：
0
作者：
藤井諭;渡部徹;吉田幸二;酒井三四郎;水野忠則;金寺登
通讯作者：
金寺登

Lecture speech recognition and lecture video segmentation.

讲座语音识别和讲座视频分割。

DOI：
发表时间：
2003
期刊：
Technical Report of the Japanese Society for Artificial Intelligence SIG-SLUD-A302
影响因子：
0
作者：
N.Kanedera;A.Sumida;J.Jikeya;T.ikehata;T.Funada
通讯作者：
T.Funada

Lecture video segmentation derived from speech by dynamic programming.

通过动态规划从语音中导出讲座视频分割。