Residential College | false |
Status | 已發表Published |
Integration of named entity information for chinese word segmentation based on maximum entropy | |
Leong K.S.; Wong F.; Li Y.; Dong M.C. | |
2008-11-27 | |
Conference Name | 4th International Conference on Intelligent Computing |
Source Publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 5226 LNCS |
Pages | 962-969 |
Conference Date | SEP 15-18, 2008 |
Conference Place | Shanghai, PEOPLES R CHINA |
Abstract | Word segmentation is an essential process in Chinese information processing. Although related researches were reported and made progresses, the Unknown Named Entity (UNE) problem in segmentation is not fully solved. This usually degrades the accuracy of segmentation in general. In this paper, a model to identify UNEs for improving the overall performance of the segmentation is presented. In order to capture the NE information, functions of characters or words are defined with tags. In addition, useful surrounding contexts are collected from a corpus and used as features. The model is constructed based on Maximum Entropy to handle the UNE identification as tagging problem. Empirical experiments show that the overall accuracy of the segmentation is improved after integrating the UNE identification module into the word segmenter. © 2008 Springer-Verlag Berlin Heidelberg. |
DOI | 10.1007/978-3-540-87442-3_118 |
URL | View the original |
Language | 英語English |
WOS ID | WOS:000259555200118 |
Scopus ID | 2-s2.0-56549095784 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | University of Macau |
Affiliation | Universidade de Macau |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Leong K.S.,Wong F.,Li Y.,et al. Integration of named entity information for chinese word segmentation based on maximum entropy[C], 2008, 962-969. |
APA | Leong K.S.., Wong F.., Li Y.., & Dong M.C. (2008). Integration of named entity information for chinese word segmentation based on maximum entropy. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5226 LNCS, 962-969. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment