Benchmarking large language models for medical education: performance on the clinical laboratory technician qualification examination

大型语言模型在医学教育中的基准测试:临床实验室技术员资格考试的表现

阅读:1

Abstract

Large language models (LLMs) have shown growing applications in medicine, yet their capabilities in the field of clinical laboratory technology remain underexplored. This study aims to evaluate the performance of LLMs in the Chinese Clinical Laboratory Technologist Qualification Examination (CCLTQE) and provide empirical evidence for their application in laboratory medicine. A dataset containing 1,600 single-choice questions is constructed for the CCLTQE exam. The dataset covers four sections: clinical laboratory fundamentals, other medical knowledge related to clinical laboratory technology, clinical laboratory specialized knowledge, and clinical laboratory professional practice competence. We select 12 LLMs for evaluation, including the DeepSeek, GPT, Llama, Qwen, and Gemma series. Results show that Qwen3-235B achieves the highest overall accuracy (89.93%), followed by DeepSeek-R1 (89.75%) and QwQ-32B (89.22%). This study demonstrates that LLMs optimized for Chinese language and domain-specific content demonstrate outstanding performance in CCLTQE, indicating significant potential for AI-assisted education and practice in laboratory medicine.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。