Residential College | false |
Status | 已發表Published |
Assessing the ability of self-attention networks to learn word order | |
Baosong Yang1; Longyue Wang2; Derek F. Wong1; Lidia S. Chao1; Zhaopeng Tu2 | |
2019-07 | |
Conference Name | 57th Annual Meeting of the Association-for-Computational-Linguistics (ACL) |
Source Publication | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
Pages | 3635-3644 |
Conference Date | JUL 28-AUG 02, 2019 |
Conference Place | Florence, ITALY |
Country | ITALY |
Publication Place | USA |
Publisher | Association for Computational Linguistics |
Abstract | Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when “lacking positional information” have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation. |
DOI | 10.18653/v1/P19-1354 |
URL | View the original |
Language | 英語English |
WOS Research Area | Computer Science ; Linguistics |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Linguistics |
WOS ID | WOS:000493046106013 |
Scopus ID | 2-s2.0-85072576319 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Zhaopeng Tu |
Affiliation | 1.NLP2CT Lab,Department of Computer and Information Science,University of Macau,Macao 2.Tencent AI Lab, |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Baosong Yang,Longyue Wang,Derek F. Wong,et al. Assessing the ability of self-attention networks to learn word order[C], USA:Association for Computational Linguistics, 2019, 3635-3644. |
APA | Baosong Yang., Longyue Wang., Derek F. Wong., Lidia S. Chao., & Zhaopeng Tu (2019). Assessing the ability of self-attention networks to learn word order. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 3635-3644. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment