UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Challenges of Neural Machine Translation for Short Texts
Wan, Yu1; Yang, Baosong2; Wong, Derek Fai1; Chao, Lidia Sam1; Yao, Liang2; Zhang, Haibo2; Chen, Boxing2
2022-06-09
Source PublicationComputational Linguistics
ISSN0891-2017
Volume48Issue:2Pages:321-342
Abstract

Short texts (STs) present in a variety of scenarios, including query, dialog, and entity names. Most of the exciting studies in neural machine translation (NMT) are focused on tackling open problems concerning long sentences rather than short ones. The intuition behind is that, with respect to human learning and processing, short sequences are generally regarded as easy examples. In this article, we first dispel this speculation via conducting preliminary experiments, showing that the conventional state-of-the-art NMT approach, namely, TRANSFORMER (Vaswani et al. 2017), still suffers from over-translation and mistranslation errors over STs. After empirically investigating the rationale behind this, we summarize two challenges in NMT for STs associated with translation error types above, respectively: (1) the imbalanced length distribution in training set intensifies model inference calibration over STs, leading to more over-translation cases on STs; and (2) the lack of contextual information forces NMT to have higher data uncertainty on short sentences, and thus NMT model is troubled by considerable mistranslation errors. Some existing approaches, like balancing data distribution for training (e.g., data upsampling) and complementing contextual information (e.g., introducing translation memory) can alleviate the translation issues in NMT for STs. We encourage researchers to investigate other challenges in NMT for STs, thus reducing ST translation errors and enhancing translation quality.

DOI10.1162/coli_a_00435
URLView the original
Language英語English
PublisherMIT Press Journals
Scopus ID2-s2.0-85131867862
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorYang, Baosong; Wong, Derek Fai
Affiliation1.NLP2CT Lab, University of Macau, Macao
2.Alibaba Group, China
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Wan, Yu,Yang, Baosong,Wong, Derek Fai,et al. Challenges of Neural Machine Translation for Short Texts[J]. Computational Linguistics, 2022, 48(2), 321-342.
APA Wan, Yu., Yang, Baosong., Wong, Derek Fai., Chao, Lidia Sam., Yao, Liang., Zhang, Haibo., & Chen, Boxing (2022). Challenges of Neural Machine Translation for Short Texts. Computational Linguistics, 48(2), 321-342.
MLA Wan, Yu,et al."Challenges of Neural Machine Translation for Short Texts".Computational Linguistics 48.2(2022):321-342.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wan, Yu]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek Fai]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wan, Yu]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek Fai]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wan, Yu]'s Articles
[Yang, Baosong]'s Articles
[Wong, Derek Fai]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.