权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Construction of an Evaluation Dataset and Quality Estimation for Neural Language Generation

神经语言生成评估数据集的构建和质量评估

基本信息

批准号：
22H03651
负责人：
小町守
金额：
$ 11.15万
依托单位：
Hitotsubashi University (2023)Tokyo Metropolitan University (2022)
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2022
资助国家：
日本
起止时间：
2022-04-01 至 2025-03-31
项目状态：
未结题

项目摘要

本研究では、言語生成の評価のためのデータセットを作成し、解釈性の高い自動評価手法を提案することを目的とします。作成されたテストデータそれぞれは言語モデルの単体テストのためのデータとして用いることができ、継続的インテグレーションを行う際の回帰テストに組み込むことができるようになります。本研究では、サブタスクごとのデータセットの収集・作成とともに、それを用いて自動評価を行う手法の提案を行います。本研究では、言語生成タスクのうち正例と負例を作成しやすい文法誤り訂正と機械翻訳を対象として、それぞれに対して評価用のデータセットおよびそれを用いた評価手法を提案します。文法誤り訂正は文法項目として習得すべき項目を列挙することが可能であり、それらの項目を正しく訂正できているか、という網羅的なチェックリストを作成することが可能です。また、機械翻訳では代表的な評価尺度（MQM, direct assessment 等）が存在するので、それらよりもきめ細かく機械翻訳を評価できる手法を提案します。2022年度は文法誤り訂正について網羅性の高いデータセットを作成する予備実験を行い、利点と欠点、限界について考察しました。機械翻訳では、MQM を用いた単語単位の品質推定の可能性について実験および考察を行いました。また、テキスト平易化の品質推定に向けたデータセットの拡充と、それを用いた品質推定手法の研究を行いました。

In this study, the purpose of this study is to make a proposal for the purpose of automatic treatment in this study. Make sure that you don't know what to say in the first place, and that you don't know what to do. The purpose of this study is to make a proposal for the implementation of a proposal by using the method of automatic training in this study. In this study, the correct examples are made into the grammar of the machine, the machine and the machine. The grammar project is in the process of getting the possibility that the grammar item column is not valid, the grammar item is in the process of getting the grammar item, and the grammar project is in the process of causing the possibility that the grammar item may be affected. The standard (MQM, direct assessment, etc.) represented by machine tools and machine tool reversal devices exists in the case of machine tools, machine tools, equipment, equipment, machine, machine. In the year 2022, the Grammar system is in the process of introducing web-based high-level information on the Internet. This is an example of how to do business, interest points, and limit points. The machine is used to determine the possibility of the machine, and the MQM is used to determine the possibility of the machine. The presumption of simplification, the presumption, the presumption.

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

日本語文法誤り訂正コーパスへの誤用タグ付け

日语语法纠错语料库的误用标签

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
小山碧海;喜友名朝視顕;三田雅人;岡照晃;小町守
通讯作者：
小町守

ProQE: Proficiency-wise Quality Estimation Dataset for Grammatical Error Correction

ProQE：用于语法错误纠正的熟练程度质量估计数据集

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Yujin Takahashi;Masahiro Kaneko;Masato Mita;Mamoru Komachi
通讯作者：
Mamoru Komachi

日本語文法誤り訂正のための誤用タグ付き評価コーパスの構築

构建带有误用标签的评估语料库以纠正日语语法错误

DOI：
发表时间：
2023
期刊：
自然言語処理
影响因子：
0
作者：
小山碧海;喜友名朝視顕;小林賢治;新井美桜;三田雅人;岡照晃;小町守
通讯作者：
小町守

Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

日语语法纠错自动评价质量评价数据集的构建

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Daisuke Suzuki;Yujin Takahashi;Ikumi Yamashita;Taichi Aida;Tosho Hirasawa;Michitaka Nakatsuji;Masato Mita;Mamoru Komachi
通讯作者：
Mamoru Komachi