Status | 已發表Published |
UM-$p$Aligner: Neural Network Based Parallel Sentence Identification Model | |
Leong, C.; Wong, D. F.; Chao, L. S. | |
2018-05-01 | |
Source Publication | 11th Workshop on Building and Using Comparable Corpora |
Pages | 53-58 |
Abstract | This paper describes the UM-pAligner for the parallel sentence identification shared task of BUCC 2018. The proposed UM-pAligner system consists of two main components, alignment candidate identification and classification models. For the identification model, we propose using an orthogonal denoising autoencoder to transform the embedding features of parallel sentences into shared and private latent spaces, with an objective to better capture the translation correspondences of parallel sentences. In classification, a maximum entropy classifier is employed to determine and select the parallel sentences from the candidate list. On Chinese-English track data, the UM-pAligner achieves a retrieval rate up to 83.65% at the identification phase when n-best is set to 80. The classification model obtains an F1-score of 73.47%, 58.54% and 56.00% respectively on sample, training and test data. |
Keyword | parallel sentence classification orthogonal denoising autoencoder neural model maximum entropy |
Language | 英語English |
The Source to Article | PB_Publication |
PUB ID | 39302 |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Wong, D. F. |
Recommended Citation GB/T 7714 | Leong, C.,Wong, D. F.,Chao, L. S.. UM-$p$Aligner: Neural Network Based Parallel Sentence Identification Model[C], 2018, 53-58. |
APA | Leong, C.., Wong, D. F.., & Chao, L. S. (2018). UM-$p$Aligner: Neural Network Based Parallel Sentence Identification Model. 11th Workshop on Building and Using Comparable Corpora, 53-58. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment