Residential Collegefalse
Status已發表Published
iCPE: A hybrid data selection model for SMT domain adaptation
Wang L.; Wong D.F.; Chao L.S.; Lu Y.; Xing J.
2013-12-01
Conference NameInternational Symposium on Natural Language Processing Based on Naturally Annotated Big Data China National Conference on Chinese Computational Linguistics NLP-NABD 2013
Source PublicationCCL 2013: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Volume8202 LNAI
Pages280-290
Conference Date10-12 October 2013
Conference PlaceSuzhou, China
PublisherSPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY
Abstract

Data selection is a significant technique to enhance the data-driven models especially for large-scale natural language processing (NLP). Recent research on statistical machine translation (SMT) domain adaptation focuses on the usage of various individual data selection models. In this paper, we proposed a hybrid data selection model named iCPE, which combines three state-of-the-art similarity metrics: Cosine tf-idf, Perplexity and Edit distance at both corpus level and model level. We conduct the experiments on Hong Kong Law Chinese-English corpus and the results show that this simple and effective hybrid model performs better over the baseline system trained on entire data as well as the best rival method. This consistently boosting performance of the proposed approach has a profound implication for mining very large corpora in a computationally-limited environment. © Springer-Verlag 2013.

KeywordData Selection Domain Adaptation Hybrid Model Similarity Metrics Statistical Machine Translation
DOI10.1007/978-3-642-41491-6_26
URLView the original
Indexed ByCPCI-S ; CPCI-SSH
Language英語English
WOS Research AreaComputer Science ; Information Science & Library Science ; Linguistics
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Interdisciplinary Applications ; Information Science & Library Science ; Linguistics ; Language & Linguistics
WOS IDWOS:000358605100026
Scopus ID2-s2.0-84893075080
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorWang L.
AffiliationUniversidade de Macau
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Wang L.,Wong D.F.,Chao L.S.,et al. iCPE: A hybrid data selection model for SMT domain adaptation[C]:SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2013, 280-290.
APA Wang L.., Wong D.F.., Chao L.S.., Lu Y.., & Xing J. (2013). iCPE: A hybrid data selection model for SMT domain adaptation. CCL 2013: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 8202 LNAI, 280-290.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang L.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.