UM  > Faculty of Arts and Humanities  > DEPARTMENT OF PORTUGUESE
Status已發表Published
UM-PCorpus: A Large Portuguese-Chinese Parallel Corpus
Lidia S. Chao1; Derek F. Wong1; Chi Hong Ao1; Ana Luísa Leal2
2018
Conference NameProceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Source PublicationLREC 2018 Conference Proceedings
Conference DateMay 7-12, 2018
Conference PlaceMiyazaki, Japan
Publication PlaceParis, France
PublisherELRA – European Language Resources Association
Abstract

This paper describes the creation of a high quality parallel corpus for Portuguese and Chinese that extracted from parallel and comparable documents. The corpus is constructed using an on-line alignment platform, UM-pAligner. The UM-pAligner consists of two main alignment components, parallel sentence identification and classification model, for acquiring the parallel sentences from either the parallel or comparable texts in a semi-automatic manner. The extracted parallel sentences are manually verified. The resulting corpus is composed of the parallel sentences covering the texts of the newswire, legal, subtitle, technical and general on-line publications, around 6 million parallel sentences. About 1 million parallel sentences are compiled and made available for download at the NLP2CT website.

KeywordPortuguese-chinese Alignment Platform Um-paligner Parallel Corpus Machine Translation
URLView the original
Language英語English
Document TypeConference paper
CollectionDEPARTMENT OF PORTUGUESE
Affiliation1.NLP2CT Lab / Department of Computer and Information Science University of Macau, Macau SAR, China
2.Department of Portuguese, University of Macau, Macau SAR, China
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Lidia S. Chao,Derek F. Wong,Chi Hong Ao,et al. UM-PCorpus: A Large Portuguese-Chinese Parallel Corpus[C], Paris, France:ELRA – European Language Resources Association, 2018.
APA Lidia S. Chao., Derek F. Wong., Chi Hong Ao., & Ana Luísa Leal (2018). UM-PCorpus: A Large Portuguese-Chinese Parallel Corpus. LREC 2018 Conference Proceedings.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Lidia S. Chao]'s Articles
[Derek F. Wong]'s Articles
[Chi Hong Ao]'s Articles
Baidu academic
Similar articles in Baidu academic
[Lidia S. Chao]'s Articles
[Derek F. Wong]'s Articles
[Chi Hong Ao]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Lidia S. Chao]'s Articles
[Derek F. Wong]'s Articles
[Chi Hong Ao]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.