权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

医学用語における語構成要素の構造化と学習難易度に関する研究

医学术语的词成分结构及学习难度研究

基本信息

批准号：
20K12552
负责人：
内山清子
金额：
$ 1.83万
依托单位：
Shonan Institute of Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2020
资助国家：
日本
起止时间：
2020-04-01 至 2024-03-31
项目状态：
已结题

项目摘要

本研究の目的は看護師を目指す学習者（外国人を含む）が、効率的に医学用語を理解するために、医学用語に様々な情報を付与して構造化することと、その構造化データに基づいて造語力、学習頻度、説明力の観点から学習難易度を設定し、その有効性を検証することである。これまで、看護教科書のデータを収集し、医学用語の抽出、医学用語の語構成要素について分析を行ってきた。分析対象とする医学用語については、既存の一般辞書と医療用語辞書に掲載されている用語を比較することで、出現頻度に基づいて整理を行った。本年度は医学用語を出現頻度、教科書における出現位置、文の中での役割、複合語内での位置をもとに構造化データを作成することと、その構造化データをもとに学習難易度を設定し、その有効性を検証することを目指した。看護教科書から抽出した単語の内、頻度が30以上の単語、頻度30以下だが既存辞書や国家試験の索引に出現していた単語 6753 語を使用した。データは 3 種類あり、収集したデータを正規化したもの、格助詞との接続の頻度の部分をtf-idf 化しその他を正規化したものを用いて、機械学習で難易度について分類実験を行った。難易度については次の4段階を設定した。1:一般的な辞書に含まれる医学用語、2:出現頻度は低いが重要な用語、3:看護の辞書でよく使用される基本的な用語、4:難易度1-3の単語が誤構成要素となり複数結合している用語。分類にはSVMとランダムフォレストで分類器を作成して実験を行った。分類に使った素性は格助詞や語構成要素の出現位置などを使用した。出現頻度が高い用語である難易度1と3については分類精度は良かったが、看護教科書だけでは出現パターンを多く収集することができず、出現頻度が低い難易度2と４についてはあまり良い分類結果にならなかった。

Purpose this study のは see junior &senior を refers す learners (foreigners をむ) が, sharper rate に medical terminology を understand するために, medical terminology に others 々な intelligence を give して structured することと, その structured データに base づいて build language, learning frequency, that forces の観 point から learning difficulty level を set し, その have sharper sex を検 card すること Youdaoplaceholder0. これまで, nursing textbooks のデータを収し, medical terms の spare, medical language の language elements について line analysis をってきた. Analysis as seaborne とする medical term については, existing の general dictionaries と medical language dictionaries に first white jasmines load されている language を compare することで frequency, に base づいて finishing line をった. This year は medical term frequency を, textbooks における present location, wen のでので within the language of cutting, the composite の position をもとに structured データを made することと, その structured データをもとに learning difficulty level を set し, その have sharper sex を検 card することを refers した. Nursing textbooks から spare したの単 language, frequency が within more than 30 の単, frequency below 30 だが existing dictionaries や national test の index に appear していた単 language 6753 language use をした. データは 3 kinds あり, 収したデータを regularized したもの, auxiliary との meet 続の frequency part のを tf idf to しその he を regularized したものを with いて, degree of difficulty mechanical learning でについて classification be 験を line った. The difficulty level is に, にて, て, て, and the <s:1> 4-level を setting is たた. 1: general な contains dictionaries にまれる medical terms, 2: a low frequency はいが important な expressions, 3: care の dictionaries でよく use される basic な expressions, 4: difficulty level 1-3 のが mistakenly 単 language elements となり plural combination している expressions. The にとラ SVMとラダムフォレストでダムフォレストでダムフォレストで classifier を is made into a <s:1> て experiment を row った. The classification に makes った the natural <s:1> case particle や the constituent elements of the <s:1> appear in the position な <s:1> をを use たた. High frequency がい language である difficulty level 1 と 3 については good classification accuracy はかっただが, nursing textbooks けでは appear パターンを more く収 set することができず frequency, low がい difficulty level 2 と 4 についてはあまいり good classification results にならなかった.