Residential College | true |
Status | 已發表Published |
Exploiting Translation Model for Parallel Corpus Mining | |
Leong, Chongman; Liu, Xuebo; Wong, Derek F.; Chao, Lidia S. | |
2021 | |
Source Publication | IEEE/ACM Transactions on Audio Speech and Language Processing |
ISSN | 2329-9290 |
Volume | 29Pages:2829-2839 |
Abstract | Parallel corpus mining (PCM) is beneficial for many corpus-based natural language processing tasks, e.g., machine translation and bilingual dictionary induction, especially in low-resource languages and domains. It relies heavily on cross-lingual representations to model the interdependencies between different languages and determine whether sentences are parallel or not. In this paper, we take the first step towards exploiting the multilingual Transformer translation model to produce expressive sentence representations for PCM. Since the traditional Transformer lacks an immediate sentence representation, we pool the output representation of the encoder as the sentence representation, which is further optimized as a part of the training flow of the translation model. Experiments conducted on the BUCC PCM task show that the proposed method improves mining performance over the existing methods with the assistance of the pre-trained multilingual BERT. To further test the usability of the proposed method, we mine parallel sentences from public resources and find that the mined sentences can indeed enhance low-resource machine translation. |
Keyword | Chinese-portuguese Translation Neural Machine Translation Parallel Corpus Mining |
DOI | 10.1109/TASLP.2021.3105798 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Acoustics ; Engineering |
WOS Subject | Acoustics ; Engineering, Electrical & Electronic |
WOS ID | WOS:000692566900002 |
Scopus ID | 2-s2.0-85113203234 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Wong, Derek F. |
Affiliation | Natural Language Processing and Portuguese-Chinese Machine Translation (NLP2CT) Laboratory, University of Macau, Macau, Macao |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Leong, Chongman,Liu, Xuebo,Wong, Derek F.,et al. Exploiting Translation Model for Parallel Corpus Mining[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29, 2829-2839. |
APA | Leong, Chongman., Liu, Xuebo., Wong, Derek F.., & Chao, Lidia S. (2021). Exploiting Translation Model for Parallel Corpus Mining. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 2829-2839. |
MLA | Leong, Chongman,et al."Exploiting Translation Model for Parallel Corpus Mining".IEEE/ACM Transactions on Audio Speech and Language Processing 29(2021):2829-2839. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment