Residential Collegefalse
Status已發表Published
Learning deep transformer models for machine translation
Wang, Qiang1; Li, Bei1; Xiao, Tong1; Zhu, Jingbo1,2; Li, Changliang3; Wong, Derek F.4; Chao, Lidia S.4
2020
Conference Name57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Source PublicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages1810-1822
Conference Date28 July-2 August 2019
Conference PlaceFlorence
Abstract

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English-German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4~2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.

URLView the original
Indexed ByCPCI-S ; CPCI-SSH
Language英語English
WOS Research AreaComputer Science ; Linguistics
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Linguistics
WOS IDWOS:000493046103030
Scopus ID2-s2.0-85084061446
Fulltext Access
Citation statistics
Cited Times [WOS]:228   [WOS Record]     [Related Records in WOS]
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorXiao, Tong
Affiliation1.NLP Lab, Northeastern University, Shenyang, China
2.NiuTrans Co., Ltd., Shenyang, China
3.Kingsoft AI Lab, Beijing, China
4.NLP2CT Lab, University of Macau, Macao
Recommended Citation
GB/T 7714
Wang, Qiang,Li, Bei,Xiao, Tong,et al. Learning deep transformer models for machine translation[C], 2020, 1810-1822.
APA Wang, Qiang., Li, Bei., Xiao, Tong., Zhu, Jingbo., Li, Changliang., Wong, Derek F.., & Chao, Lidia S. (2020). Learning deep transformer models for machine translation. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1810-1822.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Qiang]'s Articles
[Li, Bei]'s Articles
[Xiao, Tong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Qiang]'s Articles
[Li, Bei]'s Articles
[Xiao, Tong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Qiang]'s Articles
[Li, Bei]'s Articles
[Xiao, Tong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.