Evaluating LLMs' grammatical error correction performance in learner Chinese

评估语言学习模型在汉语学习者语法错误纠正方面的表现

阅读:1

Abstract

Large language models (LLMs) have recently exhibited significant capabilities in various English NLP tasks. However, their performance in Chinese grammatical error correction (CGEC) remains unexplored. This study evaluates the abilities of state-of-the-art LLMs in correcting learner Chinese errors from a corpus linguistic perspective. The performance of LLMs is assessed using standard evaluation metrics of MaxMatch score. Keyword and key n-gram analyses are conducted to quantitatively explore linguistic features that differentiate LLM outputs from those of human annotators. LLMs' performance in syntactic and semantic dimensions is further qualitatively analyzed based on these probes of keywords and key n-grams. Results show that LLMs achieve a relatively higher performance in test datasets with multiple annotators and low performance in those with a single annotator. LLMs tend to overcorrect wrong sentences, under the explicit prompt of the "minimal edit" strategy, by using more linguistic devices to generate fluent and grammatical sentences. Furthermore, they struggle with under-correction and hallucination in reasoning-dependent situations. These findings highlight the strengths and limitations of LLMs in CGEC, suggesting that future efforts should focus on refining overcorrection tendencies and improving the handling of complex semantic contexts.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。