Residential College | false |
Status | 已發表Published |
Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation | |
Li, Bin1,2,3; Chang, Bolin1,2; Xu, Zhixing1,2; Feng, Minxuan1,2; Xu, Chao1,2; Qu, Weiguang2,4; Shen, Si5; Wang, Dongbo2,6 | |
2024-05 | |
Conference Name | 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 |
Source Publication | Workshop Proceedings - 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024 |
Pages | 229-236 |
Conference Date | 25 May 2024 |
Conference Place | Torino |
Country | Italy |
Publisher | European Language Resources Association (ELRA) |
Abstract | Ancient Chinese texts have no sentence boundaries and punctuation. Adding modern Chinese punctuation to theses texts requires expertise, time and efforts. Automatic sentence segmentation and punctuation is considered as a basic task for Ancient Chinese processing, but there is no shared task to evaluate the performances of different systems. This paper presents the results of the first ancient Chinese sentence segmentation and punctuation bakeoff, which is held at the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2024. The contest uses metrics for detailed evaluations of 4 genres of unpublished texts with 11 punctuation types. Six teams submitted 32 running results. In the closed modality, the participants are only allowed to use the training data, the highest obtained F1 scores are respectively 88.47% and 75.29% in sentence segmentation and sentence punctuation. The perfermances on the unseen data is 10 percent lower than the published common data, which means there is still space for further improvement. The large language models outperform the traditional models, but LLM changes the original characters around 1-2%, due to over-generation. Thus, post-processing is needed to keep the text consistancy. |
Keyword | Ancient Chinese Evaluation Sentence Punctuation Sentence Segmentation |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85195189774 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Arts and Humanities |
Corresponding Author | Wang, Dongbo |
Affiliation | 1.School of Chinese Language and Literature, Nanjing Normal University, China 2.Center for Language Big Data and Computational Humanities, Nanjing Normal University, China 3.Faculty of Arts and Humanities, University of Macau, Macao 4.School of Computer and Electronic Information, Nanjing Normal University, China 5.School of Economics and Management, Nanjing University of Science and Technology, China 6.College of Information Management, Nanjing Agricultural University, China |
First Author Affilication | Faculty of Arts and Humanities |
Recommended Citation GB/T 7714 | Li, Bin,Chang, Bolin,Xu, Zhixing,et al. Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation[C]:European Language Resources Association (ELRA), 2024, 229-236. |
APA | Li, Bin., Chang, Bolin., Xu, Zhixing., Feng, Minxuan., Xu, Chao., Qu, Weiguang., Shen, Si., & Wang, Dongbo (2024). Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation. Workshop Proceedings - 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024, 229-236. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment