Evaluation of accuracy, quality, and readability of information on hypothyroidism provided by different artificial intelligence chatbot models

评估不同人工智能聊天机器人模型提供的甲状腺功能减退症信息的准确性、质量和可读性

阅读:2

Abstract

OBJECTIVE: This study assessed the accuracy, quality, and readability of responses from three leading AI chatbots-ChatGPT-3.5, DeepSeek-V3, and Google Gemini-2.5-on the diagnosis, treatment, and long-term risks of adult hypothyroidism, comparing their outputs with current clinical guidelines. METHODS: Two thyroid specialists developed 27 questions based on the Guideline for the Diagnosis and Management of Hypothyroidism in Adults (2017 edition), covering three categories: diagnosis, treatment, and long-term health risks. Responses from each AI model were independently evaluated by two reviewers. Accuracy was rated using a six-point Likert scale, quality using the DISCERN tool and the five-point Likert scale, and readability was assessed by the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI),and Simple Measure of Gobbledygook(SMOG). RESULTS: All three AI models demonstrated excellent performance in accuracy (mean score > 4.5) and quality (high-quality rate > 94%). According to the DISCERN tool, no significant difference was observed in the overall information quality among the models. However, Gemini-2.5 generated responses of significantly lower quality for treatment-related questions than for diagnostic inquiries. The content generated by all models was relatively difficult to comprehend (low FRE scores and high FKGL/GFI scores), generally requiring a college-level or higher education for adequate understanding. CONCLUSION: All three AI chatbots were capable of producing highly accurate and high-quality medical information regarding hypothyroidism, with their responses showing strong consistency with clinical guidelines. This underscores the substantial potential of AI in supporting medical information delivery. However, the consistently high reading difficulty of their outputs may limit their practical utility in patient education. Future research should focus on improving the readability and patient-friendliness of AI outputs-through prompt engineering and multi-round dialogue optimization-while maintaining professional accuracy, to enable broader application of AI in health education.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。