UM  > Faculty of Arts and Humanities
Residential Collegefalse
Status已發表Published
Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation
Li, Bin1,2,3; Chang, Bolin1,2; Xu, Zhixing1,2; Feng, Minxuan1,2; Xu, Chao1,2; Qu, Weiguang2,4; Shen, Si5; Wang, Dongbo2,6
2024-05
Conference Name3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024
Source PublicationWorkshop Proceedings - 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024
Pages229-236
Conference Date25 May 2024
Conference PlaceTorino
CountryItaly
PublisherEuropean Language Resources Association (ELRA)
Abstract

Ancient Chinese texts have no sentence boundaries and punctuation. Adding modern Chinese punctuation to theses texts requires expertise, time and efforts. Automatic sentence segmentation and punctuation is considered as a basic task for Ancient Chinese processing, but there is no shared task to evaluate the performances of different systems. This paper presents the results of the first ancient Chinese sentence segmentation and punctuation bakeoff, which is held at the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2024. The contest uses metrics for detailed evaluations of 4 genres of unpublished texts with 11 punctuation types. Six teams submitted 32 running results. In the closed modality, the participants are only allowed to use the training data, the highest obtained F1 scores are respectively 88.47% and 75.29% in sentence segmentation and sentence punctuation. The perfermances on the unseen data is 10 percent lower than the published common data, which means there is still space for further improvement. The large language models outperform the traditional models, but LLM changes the original characters around 1-2%, due to over-generation. Thus, post-processing is needed to keep the text consistancy.

KeywordAncient Chinese Evaluation Sentence Punctuation Sentence Segmentation
URLView the original
Language英語English
Scopus ID2-s2.0-85195189774
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Arts and Humanities
Corresponding AuthorWang, Dongbo
Affiliation1.School of Chinese Language and Literature, Nanjing Normal University, China
2.Center for Language Big Data and Computational Humanities, Nanjing Normal University, China
3.Faculty of Arts and Humanities, University of Macau, Macao
4.School of Computer and Electronic Information, Nanjing Normal University, China
5.School of Economics and Management, Nanjing University of Science and Technology, China
6.College of Information Management, Nanjing Agricultural University, China
First Author AffilicationFaculty of Arts and Humanities
Recommended Citation
GB/T 7714
Li, Bin,Chang, Bolin,Xu, Zhixing,et al. Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation[C]:European Language Resources Association (ELRA), 2024, 229-236.
APA Li, Bin., Chang, Bolin., Xu, Zhixing., Feng, Minxuan., Xu, Chao., Qu, Weiguang., Shen, Si., & Wang, Dongbo (2024). Overview of EvaHan2024: The First International Evaluation on Ancient Chinese Sentence Segmentation and Punctuation. Workshop Proceedings - 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024, 229-236.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Bin]'s Articles
[Chang, Bolin]'s Articles
[Xu, Zhixing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Bin]'s Articles
[Chang, Bolin]'s Articles
[Xu, Zhixing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Bin]'s Articles
[Chang, Bolin]'s Articles
[Xu, Zhixing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.