Residential Collegetrue
Status已發表Published
Exploiting Translation Model for Parallel Corpus Mining
Leong, Chongman; Liu, Xuebo; Wong, Derek F.; Chao, Lidia S.
2021
Source PublicationIEEE/ACM Transactions on Audio Speech and Language Processing
ISSN2329-9290
Volume29Pages:2829-2839
Abstract

Parallel corpus mining (PCM) is beneficial for many corpus-based natural language processing tasks, e.g., machine translation and bilingual dictionary induction, especially in low-resource languages and domains. It relies heavily on cross-lingual representations to model the interdependencies between different languages and determine whether sentences are parallel or not. In this paper, we take the first step towards exploiting the multilingual Transformer translation model to produce expressive sentence representations for PCM. Since the traditional Transformer lacks an immediate sentence representation, we pool the output representation of the encoder as the sentence representation, which is further optimized as a part of the training flow of the translation model. Experiments conducted on the BUCC PCM task show that the proposed method improves mining performance over the existing methods with the assistance of the pre-trained multilingual BERT. To further test the usability of the proposed method, we mine parallel sentences from public resources and find that the mined sentences can indeed enhance low-resource machine translation.

KeywordChinese-portuguese Translation Neural Machine Translation Parallel Corpus Mining
DOI10.1109/TASLP.2021.3105798
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaAcoustics ; Engineering
WOS SubjectAcoustics ; Engineering, Electrical & Electronic
WOS IDWOS:000692566900002
Scopus ID2-s2.0-85113203234
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorWong, Derek F.
AffiliationNatural Language Processing and Portuguese-Chinese Machine Translation (NLP2CT) Laboratory, University of Macau, Macau, Macao
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Leong, Chongman,Liu, Xuebo,Wong, Derek F.,et al. Exploiting Translation Model for Parallel Corpus Mining[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29, 2829-2839.
APA Leong, Chongman., Liu, Xuebo., Wong, Derek F.., & Chao, Lidia S. (2021). Exploiting Translation Model for Parallel Corpus Mining. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 2829-2839.
MLA Leong, Chongman,et al."Exploiting Translation Model for Parallel Corpus Mining".IEEE/ACM Transactions on Audio Speech and Language Processing 29(2021):2829-2839.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Leong, Chongman]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Leong, Chongman]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Leong, Chongman]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.