UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
TransGEC: Improving Grammatical Error Correction with Translationese
Fang, Tao1; Liu, Xuebo2,3; Wong, Derek F.1; Zhan, Runzhe1; Ding, Liang4; Chao, Lidia S.1; Tao, Dacheng5; Zhang, Min2,3
2023
Conference Name61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Source PublicationProceedings of the Annual Meeting of the Association for Computational Linguistics
Pages3614-3633
Conference Date9 July 2023through 14 July 2023
Conference PlaceToronto
PublisherAssociation for Computational Linguistics (ACL)
Abstract

Data augmentation is an effective way to improve model performance of grammatical error correction (GEC). This paper identifies a critical side-effect of GEC data augmentation, which is due to the style discrepancy between the data used in GEC tasks (i.e., texts produced by non-native speakers) and data augmentation (i.e., native texts). To alleviate this issue, we propose to use an alternative data source, translationese (i.e., human-translated texts), as input for GEC data augmentation, which 1) is easier to obtain and usually has better quality than non-native texts, and 2) has a more similar style to non-native texts. Experimental results on the CoNLL14 and BEA19 English, NLPCC18 Chinese, Falko-MERLIN German, and RULEC-GEC Russian GEC benchmarks show that our approach consistently improves correction accuracy over strong baselines. Further analyses reveal that our approach is helpful for overcoming mainstream correction difficulties such as the corrections of frequent words, missing words, and substitution errors. Data, code, models and scripts are freely available at https://github.com/NLP2CT/TransGEC.

URLView the original
Language英語English
Scopus ID2-s2.0-85174387119
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorLiu, Xuebo; Wong, Derek F.
Affiliation1.NLP,
2.CT Lab, Department of Computer and Information Science, University of Macau, Macao
3.Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
4.JD Explore Academy, China
5.The University of Sydney, Australia
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Fang, Tao,Liu, Xuebo,Wong, Derek F.,et al. TransGEC: Improving Grammatical Error Correction with Translationese[C]:Association for Computational Linguistics (ACL), 2023, 3614-3633.
APA Fang, Tao., Liu, Xuebo., Wong, Derek F.., Zhan, Runzhe., Ding, Liang., Chao, Lidia S.., Tao, Dacheng., & Zhang, Min (2023). TransGEC: Improving Grammatical Error Correction with Translationese. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 3614-3633.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Fang, Tao]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Fang, Tao]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Fang, Tao]'s Articles
[Liu, Xuebo]'s Articles
[Wong, Derek F.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.