Residential College | false |
Status | 已發表Published |
On the Copying Behaviors of Pre-Training for Neural Machine Translation | |
Liu, Xuebo1; Wang, Longyue2; Wong, Derek F.1; Ding, Liang3; Chao, Lidia S.1; Shi, Shuming2; Tu, Zhaopeng2 | |
2021 | |
Conference Name | The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) |
Source Publication | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 |
Pages | 4265-4275 |
Conference Date | 1 August 2021through 6 August 2021 |
Conference Place | Virtual, Online |
Author of Source | Zong C., Xia F., Li W., Navigli R. |
Publisher | Association for Computational Linguistics (ACL) |
Abstract | Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty. |
DOI | 10.48550/arXiv.2107.08212 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85114656790 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Liu, Xuebo |
Affiliation | 1.NLP2CT Lab, Department of Computer and Information Science, University of Macau, Macao 2.Tencent AI Lab, China 3.The University of Sydney, Australia |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Liu, Xuebo,Wang, Longyue,Wong, Derek F.,et al. On the Copying Behaviors of Pre-Training for Neural Machine Translation[C]. Zong C., Xia F., Li W., Navigli R.:Association for Computational Linguistics (ACL), 2021, 4265-4275. |
APA | Liu, Xuebo., Wang, Longyue., Wong, Derek F.., Ding, Liang., Chao, Lidia S.., Shi, Shuming., & Tu, Zhaopeng (2021). On the Copying Behaviors of Pre-Training for Neural Machine Translation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 4265-4275. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment