权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

複数話者の音声コミュニケーションの意図・状況理解

了解多说话者语音通信的意图和情况

基本信息

批准号：
13224057
负责人：
河原達也
金额：
--
依托单位：
Kyoto University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas (C)
财政年份：
2001
资助国家：
日本
起止时间：
2001 至无数据
项目状态：
已结题

项目摘要

会議・会話など人間どうしの音声を対象として、音声データの収集を行うとともに、その音響的・言語的そしてコミュニケーションの観点からのモデル化を行った。まず会議音声を対象として、階層的なアーカイブを構築し、議事録の作成支援を行うシステムを設計した。GMMによる話者識別を行い、その結果により音声を分割するとともに、話者IDや時間情報などのインデックスを生成する。また談話標識を含むキーフレーズの検出により議論の結論となる発話を特定し、議事次第や会議の配布資料などに含まれる話題依存語彙を利用して、これを自動的に書き起こし、議事録のドラフトとする。以上により音声・インデックス・テキストの3層からなるアーカイブを構築することができる。次に、この談話標識に基づく自動インデキシングを大規模な講演音声コーパスに対して適用・評価を行った。学習データの講演の書き起こしからポーズ情報を用いてセクション境界候補を検出し、統計的言語モデルを用いて句点を挿入して、各セクションの先頭の一文を抽出する。その中に含まれる名詞から単語頻度と文頻度に基づいて談話標識を選定する。これらの過程は人手によるタグを必要としない教師なし学習により行われる。評価データの各文について談話標識の単語頻度と文頻度の統計量に基づく評価値を計算し、その合計が閾値以上であればインデックスを付与する。実際の講演音声の書き起こしと音声認識結果に対して評価を行った結果、再現率85%程度(適合率は20%程度)の精度で話題セクション境界を自動検出することができた。

The conference, the conversation, the sound, the collection, the sound, the speech, the conversation, the The voice of the meeting is related to the structure of the meeting, the creation of the meeting record, and the design of the meeting record. GMM: Speaker identification, voice segmentation, speaker ID, and time information generation Conversation identification includes the identification of topics, the identification of conclusions, the identification of topics, and the identification of topics. The third floor of the building is constructed by the sound and sound. Second, the speech logo is automatically displayed on a large scale. The first part of the speech is about the selection of candidates, the selection of statistics, and the selection of the first part of the speech In the middle of the sentence, the noun, the word frequency, the word frequency, the word base, and the word mark are selected. The process of teaching is necessary for teachers to learn For each text of the evaluation data, the basic evaluation value is calculated based on the single word frequency of the conversation logo and the statistics of the text frequency. If the total value exceeds the threshold value, the evaluation data will be paid. When the speech sound is written, the sound recognition result is evaluated, the reproduction rate is 85%(the suitability rate is 20%), the topic state is automatically detected, and the sound recognition result is evaluated.