Abstract
Cataract surgery is one of the most common and effective surgeries performed worldwide, yet patient education remains a challenge due to limitations in health literacy among the general population. Our study evaluated the reliability of different large language models (LLMs) in providing accurate, complete, and clear responses to frequently asked questions (FAQs) related to cataract surgery. A comprehensive list of 20 FAQs about cataract surgery were submitted sequentially as a prompt to nine different LLMs. All 180 answers were recorded and scored by two expert ophthalmologists, blinded to the model type, on a 5-point scale measuring the degree of accuracy, completeness, and clarity. Interrater agreement was measured using a weighted kappa coefficient and model performances were compared using the Friedman test and post-hoc analysis. Our results showed all models performed well responding to FAQs (79% of responses scored "excellent"), serving as effective tools in answering patient FAQs. LLaMA 4 and Copilot scored lower on average relative to other models (p < .05), however, they remained effective at FAQ responses overall. Potential expansion of LLMs as patient education tools into clinical settings should be considered, as they exhibit effectiveness in providing clear, accurate, and complete responses to cataract surgery FAQs.