Evaluating the reliability of large language models in answering FAQs for cataract surgery

评估大型语言模型在回答白内障手术常见问题方面的可靠性

阅读:1

Abstract

Cataract surgery is one of the most common and effective surgeries performed worldwide, yet patient education remains a challenge due to limitations in health literacy among the general population. Our study evaluated the reliability of different large language models (LLMs) in providing accurate, complete, and clear responses to frequently asked questions (FAQs) related to cataract surgery. A comprehensive list of 20 FAQs about cataract surgery were submitted sequentially as a prompt to nine different LLMs. All 180 answers were recorded and scored by two expert ophthalmologists, blinded to the model type, on a 5-point scale measuring the degree of accuracy, completeness, and clarity. Interrater agreement was measured using a weighted kappa coefficient and model performances were compared using the Friedman test and post-hoc analysis. Our results showed all models performed well responding to FAQs (79% of responses scored "excellent"), serving as effective tools in answering patient FAQs. LLaMA 4 and Copilot scored lower on average relative to other models (p < .05), however, they remained effective at FAQ responses overall. Potential expansion of LLMs as patient education tools into clinical settings should be considered, as they exhibit effectiveness in providing clear, accurate, and complete responses to cataract surgery FAQs.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。