Comparison of the Accuracy, Comprehensiveness, and Readability of ChatGPT, Google Gemini, and Microsoft Copilot on Dry Eye Disease

ChatGPT、Google Gemini 和 Microsoft Copilot 在干眼症方面的准确性、全面性和可读性比较

阅读:1

Abstract

OBJECTIVES: This study compared the performance of ChatGPT, Google Gemini, and Microsoft Copilot in answering 25 questions about dry eye disease and evaluated comprehensiveness, accuracy, and readability metrics. METHODS: The artificial intelligence (AI) platforms answered 25 questions derived from the American Academy of Ophthalmology's Eye Health webpage. Three reviewers assigned comprehensiveness (0-5) and accuracy (-2 to 2) scores. Readability metrics included Flesch-Kincaid Grade Level, Flesch Reading Ease Score, sentence/word statistics, and total content measures. Responses were rated by three independent reviewers. Readability metrics were also calculated, and platforms were compared using Kruskal-Wallis and Friedman tests with post hoc analysis. Reviewer consistency was assessed using the intraclass correlation coefficient (ICC). RESULTS: Google Gemini demonstrated the highest comprehensiveness and accuracy scores, significantly outperforming Microsoft Copilot (p<0.001). ChatGPT produced the most sentences and words (p<0.001), while readability metrics showed no significant differences among models (p>0.05). Inter-observer reliability was highest for Google Gemini (ICC=0.701), followed by ChatGPT (ICC=0.578), with Microsoft Copilot showing the lowest agreement (ICC=0.495). These results indicate Google Gemini's superior performance and consistency, whereas Microsoft Copilot had the weakest overall performance. CONCLUSION: Google Gemini excelled in content volume while maintaining high comprehensiveness and accuracy, outperforming ChatGPT and Microsoft Copilot in content generation. The platforms displayed comparable readability and linguistic complexity. These findings inform AI tool selection in health-related contexts, emphasizing Google Gemini's strengths in detailed responses. Its superior performance suggests potential utility in clinical and patient-facing applications requiring accurate and comprehensive content.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。