权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

コーパスに基づく自然言語の曖昧性解消に関する研究

基于语料库的自然语言消歧研究

基本信息

批准号：
07780312
负责人：
福本文代
金额：
$ 0.7万
依托单位：
University of Yamanashi
依托单位国家：
日本
项目类别：
Grant-in-Aid for Encouragement of Young Scientists (A)
财政年份：
1995
资助国家：
日本
起止时间：
1995 至无数据
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-07780312/
关键词：
コーパス係り先の曖昧性統計手法スムーシング法類似度

项目摘要

本研究では英語の前置詞句の係り先の曖昧性に関する問題を取り上げ,コーパスから自動的に得られた知識を利用してこの曖昧性を解消する手法の提案を行なった.本研究でアピールする点,特に関連する研究との差異に注目した特徴は,以下の3点である.1.品詞づけされたテキストから解消に必要な情報の抽出を行っている.大量のテキストから曖昧性の解消に役立つ情報の抽出を行う場合,多くの研究は構文解析されたテキストを用いて,情報の抽出を行っている.しかし既存の構文解析システムが解析に必要な知識を十分に備えていないことから,コーパスの種類が限定されたり,また人手により構文解析結果を作成している.本研究では,品詞づけされたコーパスの入力とし,そこから解消に必要な知識の抽出を行っているため,これらの問題を回避できる.2.語の頻度数が少ない語に関しては類推を行うことにより意味的な関係の抽出を行っている.曖昧性の解消を行う際,コーパスに出現する頻度が少ない語は,抽出した知識が適用できない場合がある.そういった語に対しては類推,つまりその語と最も意味的に近い語を,抽出した知識から選び出すことで,曖昧性の解消を行っている。この手法の提案により,正解率が、40%増加するという結果が得られた。3.本手法は,関係代名詞節のスコープの曖昧性の問題や名詞句の係り先の問題にも適用可能である.コーパスから情報の抽出を行う場合,意味的な関係の強さを計算するための尺度として2語間の意味的な関係を抽出する手法が従来より多く提案されている.しかしこの尺度では曖昧性の解消には不十分であることからN語の意味的な関係を計算する計算式を提案した.この手法の提案により前置詞句の係り先の曖昧性の解消率が上がるだけでなく,さらに多数の語の情報を必要とする関係代名詞節のスコープや,名詞句の係り先といった複雑な問題にも適用可能となる.実験では,前置詞として‘for',‘in',‘with'を用い,曖昧性の解消に用いる情報として2語の意味的な関係と3語の意味的な関係を用いた場合とで比較実験を行った結果,前者が49%の正解率であるのに対し,後者は70.1%の正解率が得られた(論文1参照).また類推の比較としてDaganが提案したsmoothing methodと本手法との比較実験を行った結果,前者は57.6%の正解率に対し,本手法は,63.5%の正解率が得られた(論文2参照).

This paper proposes a method for solving the problem of ambiguity in English pre-sentence system. This study focuses on the following three points: 1. A large number of research papers are used to analyze ambiguous information and extract information. The necessary knowledge for analysis of the existing structure analysis system is very well prepared, and the types of the system are limited, and the results of the structure analysis are prepared manually. This study is aimed at finding ways to avoid the problem of knowledge extraction. 2. The frequency of words is less than that of words. Ambiguity and resolution of the line, the frequency of occurrence of the word is less, the extraction of knowledge is applicable to the situation. The most important thing is to extract knowledge and to eliminate ambiguity. The correct solution rate increased by 40%. 3. This method is applicable to the problem of ambiguity in relation to pronouns. In the case of extracting information, the strength of the relationship between meanings is calculated. The scale of ambiguity is not very clear. The relationship between language and meaning is calculated. This method of proposal is based on the premise that the ambiguity of the sentence is resolved in the first place, and the information of most of the sentences is necessary to the relationship of the pronoun section. In this paper, we compare the results of the former and the latter, and find that the former has a correct rate of 49% and the latter has a correct rate of 70.1% respectively. Compared with the smoothing method proposed by Dagan, the former method has a correct solution rate of 57.6%, while the present method has a correct solution rate of 63.5%(see paper 2).