权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Unifying Pre-training and Multilingual Semantic Representation Learning for Low-resource Neural Machine Translation

统一预训练和多语言语义表示学习以实现低资源神经机器翻译

基本信息

批准号：
22KJ1843
负责人：
毛卓遠
金额：
$ 1.09万
依托单位：
Kyoto University
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2023
资助国家：
日本
起止时间：
2023-03-08 至 2024-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-22KJ1843/
关键词：
multilingual translation low-resource translation multilingual embedding model efficiency

项目摘要

In the past year, we focused on improving the efficiency of multilingual sentence representation learning and exploring novel methods for improving multilingual machine translation. Both research promotes the research for multilingual / low-resource neural machine translation.(1) We proposed an efficient and effective method for training and presented the work in 言語処理学会 2023. On the other hand, we proposed knowledge distillation for compressing a large model, and it has been accepted to EACL 2023 main conference, which leads to efficient model inference. With the above achievements, the process of collecting parallel sentences for training translation systems will be accelerated. Specifically, the model training phase can be accelerated by 4 - 16 times, and the model inference phase can achieve 2.5 - 5 times speedup with further faster speed on downstream tasks.(2) We explored novel ways to improve the multilingual translation system with a word-level contrastive learning technique and obtained better translation quality for low-resource language pairs, which was accepted by NAACL 2022 findings. We also explained the improvements by showing the relationship between BLEU scores and sentence retrieval performance of the NMT encoder, which motivates that future work can focus on further improving the encoder’s retrieval performance in many-to-many NMT and contrastive objective’s feasibility in a massively multilingual scenario.

In the past year, we focused on improving the efficiency of multilingual sentence representation learning and exploring novel methods for improving multilingual machine translation. Both research promotes the research for multilingual / low-resource neural machine translation. (1)We proposed an efficient and effective method for training and presented the work in Speech Processing Society 2023. On the other hand, we proposed knowledge distillation for compressing a large model, and it has been accepted to EACL 2023 main conference, which leads to efficient model inference. With the above achievements, the process of collecting parallel sentences for training translation systems will be accelerated. Specifically, the model training phase can be accelerated by 4 - 16 times, and the model inference phase can achieve 2.5 - 5 times speedup with further faster speed on downstream tasks. (2) We explored novel ways to improve the multilingual translation system with a word-level contrastive learning technique and obtained better translation quality for low-resource language pairs, which was accepted by NAACL 2022 findings. We also explained the improvements by showing the relationship between BLEU scores and sentence retrieval performance of the NMT encoder, which motivates that future work can focus on further improving the encoder’s retrieval performance in many-to-many NMT and contrastive objective’s feasibility in a massively multilingual scenario.