权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Automatic indexing for lecture speech and its advanced utilization through speech interaction

讲座演讲自动索引及其通过语音交互的高级利用

基本信息

批准号：
17300064
负责人：
NAKAGAWA Seiichi
金额：
$ 10.26万
依托单位：
Toyohashi University of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2005
资助国家：
日本
起止时间：
2005 至 2007
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-17300064/
关键词：
class room lecture speech speech recognition spoken language language model speech summarization indexing speech retrieval brounsing 音声ドキュメント

项目摘要

We collected the class room lecture speech consisting of 16 speakers, 114 lectures, and 3860 minutes, and publised the corpus. We developed the procedure of automatic speech recognition, sentence extraction, segmentation/indexing, spoken retrieval and construction of lecture browsing system for classroom lecture data of our university's graduated course. These processes axe necessary to improve the usability of broadcasting sound or video data In the case of lecture, summarized and indexed lecture speech or video enables to students to more effective leaning. Our goal was to construct a framework of such structured lecture contents. To achieve this goal, first, we investigated influence of the recording methods on the speech recognition performance. It turned out that there was 23% difference on the accuracy between a high quality hand-microphone and a low quality lapel microphone. Furthermore, we improved the domain-dependent language model by using related Web texts and developed a filler insertion model. Second, we tried automatic summarization by extracting important sentences, and we obtained 0.319-0.456 κ value, comparable with human doing 0.407-0.477. Finally, we constructed the lecture browsing system which enables users to learn more effectively by using results of the procedure described above, and evaluated it

我们收集了16位主讲人、114讲、3860分钟的课堂讲稿，并发布了语料库。针对我校毕业课程的课堂讲课数据，开发了自动语音识别、句子提取、分词/标引、语音检索和构建课堂浏览系统的流程。这些过程对于提高广播声音或视频数据的可用性是必要的，在讲座的情况下，总结和索引讲座演讲或视频可以使学生更有效地学习。我们的目标是构建这样一个结构化讲座内容的框架。为了实现这一目标，我们首先研究了录音方法对语音识别性能的影响。结果表明，高质量的手持麦克风和低质量的翻领麦克风在准确度上有23%的差异。在此基础上，利用相关的Web文本对领域相关语言模型进行了改进，并建立了一个填充符插入模型。其次，我们尝试通过提取重要句子进行自动摘要，得到了0.319-0.456的κ值，与人类的0.407-0.477相当。最后，我们利用上述过程的结果构建了使用户能够更有效地学习的讲座浏览系统，并对其进行了评价