Residential Collegefalse
Status已發表Published
On the Copying Behaviors of Pre-Training for Neural Machine Translation
Liu, Xuebo1; Wang, Longyue2; Wong, Derek F.1; Ding, Liang3; Chao, Lidia S.1; Shi, Shuming2; Tu, Zhaopeng2
2021
Conference NameThe Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
Source PublicationFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Pages4265-4275
Conference Date1 August 2021through 6 August 2021
Conference PlaceVirtual, Online
Author of SourceZong C., Xia F., Li W., Navigli R.
PublisherAssociation for Computational Linguistics (ACL)
Abstract

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

DOI10.48550/arXiv.2107.08212
URLView the original
Language英語English
Scopus ID2-s2.0-85114656790
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorLiu, Xuebo
Affiliation1.NLP2CT Lab, Department of Computer and Information Science, University of Macau, Macao
2.Tencent AI Lab, China
3.The University of Sydney, Australia
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Liu, Xuebo,Wang, Longyue,Wong, Derek F.,et al. On the Copying Behaviors of Pre-Training for Neural Machine Translation[C]. Zong C., Xia F., Li W., Navigli R.:Association for Computational Linguistics (ACL), 2021, 4265-4275.
APA Liu, Xuebo., Wang, Longyue., Wong, Derek F.., Ding, Liang., Chao, Lidia S.., Shi, Shuming., & Tu, Zhaopeng (2021). On the Copying Behaviors of Pre-Training for Neural Machine Translation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 4265-4275.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Liu, Xuebo]'s Articles
[Wang, Longyue]'s Articles
[Wong, Derek F.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Liu, Xuebo]'s Articles
[Wang, Longyue]'s Articles
[Wong, Derek F.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Liu, Xuebo]'s Articles
[Wang, Longyue]'s Articles
[Wong, Derek F.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.