Comparison of the performance of large language models in answering patient questions related to cataract

比较大型语言模型在回答白内障相关患者问题方面的性能

阅读:2

Abstract

This study evaluated the performance of four popular large-scale language models (ChatGPT o3-mini, Gemini 2.0 pro experimental, Deep Seek Thinking R1, and Kimi Thinking K1.5) in addressing frequently asked patient questions about cataracts and cataract surgery in Chinese. DeepSeek Thinking R1 performed comparably to Gemini 2.0 pro experimental in accuracy, while outperforming both ChatGPT o3-mini and Kimi Thinking K1.5. In terms of completeness and consistency, DeepSeek Thinking R1 showed superior performance over the other three LLMs. Regarding legibility and safety, DeepSeek Thinking R1, Gemini 2.0 pro experimental, and ChatGPT o3-mini exhibited comparable results, all performing better than Kimi Thinking K1.5. Deep Seek Thinking R1 demonstrated the strongest overall performance among the four LLMs in this comparative evaluation. The modern LLMs are promising tools for public education in ophthalmology while human oversight is still required.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。