权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Integrated Ensemble Learning with Embedded Vectors in Authorship Attribution

作者归属中使用嵌入式向量的集成集成学习

基本信息

批准号：
22K12726
负责人：
金明哲
金额：
$ 2.66万
依托单位：
Doshisha University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
2022
资助国家：
日本
起止时间：
2022-04-01 至 2025-03-31
项目状态：
未结题

项目摘要

本年度は、まず深層学習理論に基づいた埋め込みベクトルの一種であるBERTについて、日本語の大規模データを用いた事前学習済みの数種類（京大BERT、東北大BERT、NICT BERT、朝日BERT、青空文庫BERT、青空文庫＋WikipediaBERTなど）モデルを収集し、実装実験を行った。同時に著者推定タスクにおいて異なる事前学習データに基づいて構築された複数のBERTの性能比較を行うため、必要となるコーパス（青空文庫から10人、それぞれ20篇の小説、青空文庫外の文豪による小説10人それぞれ20篇を電子化）を作成した。続いて、収集した事前学習済みのBERTモデルについて、著者推定タスクのために作成したコーパスを用いて比較分析を行った。その結果、以下のことを明らかにした。（１）事前学習済みのBERTは著者推定タスクに有効であるが、本タスクに適応できないBERTがある。（２）青空文庫内の著者の推定においては青空文庫から構築されたBERTの精度が高い。（３）青空文庫外の著者の推定においては、青空文庫から構築されたBERTの精度は（２）の青空文庫内の結果と比べて低くい。（４）いずれの実験コーパスにおいてWikipedia、日本語ビジネスニュース記事で学習されたBERTより、青空文庫で学習させたBERTの精度が高い。（５）事前学習データが個別タスクを解く際のモデルの性能に影響を与えている。（６）異なるコーパスで学習したBERTをアンサンブル学習することにより精度を向上させることが可能である。

This year, we have collected and implemented a number of types of advance learning tools (Kyoto University BERT, Tohoku University BERT, NICT BERT, Asahi BERT, Aozora Library BERT, Aozora Library +WikipediaBERT) for deep learning theory. At the same time, the author estimates that there are different types of pre-learning and basic construction, and the performance comparison of BERT is necessary.(There are 10 people in the Qingkong Library, 20 novels in the Qingkong Library, and 20 novels in the Qingkong Library.) In addition, the author estimates that the author has made a comparative analysis of the results. The results are as follows: (1) Pre-learning (2) The accuracy of BERT construction in Qingkong library is high. (3) The accuracy of the estimation of the author outside the Qingkong library is lower than that of the construction of the BERT inside the Qingkong library. (4) The accuracy of this article is high in Wikipedia, Japanese, Chinese and English. (5) Prior to learning, the performance of the individual is affected by the problem. (6) Different from each other, learning is possible.