Abstract
This study evaluated the performance of four popular large-scale language models (ChatGPT o3-mini, Gemini 2.0 pro experimental, Deep Seek Thinking R1, and Kimi Thinking K1.5) in addressing frequently asked patient questions about cataracts and cataract surgery in Chinese. DeepSeek Thinking R1 performed comparably to Gemini 2.0 pro experimental in accuracy, while outperforming both ChatGPT o3-mini and Kimi Thinking K1.5. In terms of completeness and consistency, DeepSeek Thinking R1 showed superior performance over the other three LLMs. Regarding legibility and safety, DeepSeek Thinking R1, Gemini 2.0 pro experimental, and ChatGPT o3-mini exhibited comparable results, all performing better than Kimi Thinking K1.5. Deep Seek Thinking R1 demonstrated the strongest overall performance among the four LLMs in this comparative evaluation. The modern LLMs are promising tools for public education in ophthalmology while human oversight is still required.