Residential College | false |
Status | 已發表Published |
iCPE: A hybrid data selection model for SMT domain adaptation | |
Wang L.![]() ![]() | |
2013-12-01 | |
Conference Name | International Symposium on Natural Language Processing Based on Naturally Annotated Big Data China National Conference on Chinese Computational Linguistics NLP-NABD 2013 |
Source Publication | CCL 2013: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
![]() |
Volume | 8202 LNAI |
Pages | 280-290 |
Conference Date | 10-12 October 2013 |
Conference Place | Suzhou, China |
Publisher | SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY |
Abstract | Data selection is a significant technique to enhance the data-driven models especially for large-scale natural language processing (NLP). Recent research on statistical machine translation (SMT) domain adaptation focuses on the usage of various individual data selection models. In this paper, we proposed a hybrid data selection model named iCPE, which combines three state-of-the-art similarity metrics: Cosine tf-idf, Perplexity and Edit distance at both corpus level and model level. We conduct the experiments on Hong Kong Law Chinese-English corpus and the results show that this simple and effective hybrid model performs better over the baseline system trained on entire data as well as the best rival method. This consistently boosting performance of the proposed approach has a profound implication for mining very large corpora in a computationally-limited environment. © Springer-Verlag 2013. |
Keyword | Data Selection Domain Adaptation Hybrid Model Similarity Metrics Statistical Machine Translation |
DOI | 10.1007/978-3-642-41491-6_26 |
URL | View the original |
Indexed By | CPCI-S ; CPCI-SSH |
Language | 英語English |
WOS Research Area | Computer Science ; Information Science & Library Science ; Linguistics |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Interdisciplinary Applications ; Information Science & Library Science ; Linguistics ; Language & Linguistics |
WOS ID | WOS:000358605100026 |
Scopus ID | 2-s2.0-84893075080 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Wang L. |
Affiliation | Universidade de Macau |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Wang L.,Wong D.F.,Chao L.S.,et al. iCPE: A hybrid data selection model for SMT domain adaptation[C]:SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2013, 280-290. |
APA | Wang L.., Wong D.F.., Chao L.S.., Lu Y.., & Xing J. (2013). iCPE: A hybrid data selection model for SMT domain adaptation. CCL 2013: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 8202 LNAI, 280-290. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment