Comparative evaluation of viral hepatitis question responses: ChatGPT-4.5 outperforms three established models

病毒性肝炎问题回答的比较评估：ChatGPT-4.5 优于三种已有的模型

阅读：2

作者：Ma,Juntao,Gong,Linyan,Song,Yuchen,Wang,Guiyang,Xia,Juan,Cheng,Xiaofeng,Liu,Yun,Jia,Bei,Chen,Yuxin

期刊：	BMC Medical Informatics and Decision Making	影响因子：	3.800
时间：	2025	起止号：	2025 Nov 26;25(1):429
doi：	10.1186/s12911-025-03273-4	研究方向：	微生物学、毒理研究
疾病类型：	肝炎

Abstract

BACKGROUND: Viral hepatitis is a major global public health problem that affects millions of people; therefore, accurate and accessible information is essential for both the general public and non-specialist healthcare providers to correctly understand, prevent, and manage the disease. This study evaluated four large language models (LLMs)-Gemini-2.0, Claude-3.5-sonnet, ChatGPT-4.5, and ChatGPT-4-and compared their responses to viral hepatitis-related questions to assess differences in performance across models. METHODS: This comparative evaluation study, conducted at Nanjing Drum Tower Hospital from March to April 2025, examined 52 questions pertaining to viral hepatitis. Four large language models were assessed based on their responses to these 52 questions which encompassed four domains: concepts, risk factors, diagnosis, and prevention and treatment. Initial evaluation used a three-point scale of good, borderline, and poor. Further evaluation criteria included relevance, comprehensiveness, accuracy, safety, and readability, with each response scored on a scale of 1 to 5. RESULTS: ChatGPT-4.5 achieved the highest performance, with 89.1% of its responses rated as good, significantly outperforming Claude-3.5-sonnet (71.15% good), Gemini-2.0 (62.82% good), and ChatGPT-4 (50.64% good). Statistical analysis confirmed superior performance of ChatGPT-4.5 in all evaluated dimensions. Consistently, ChatGPT-4.5 scored the highest across all five criteria: relevance, comprehensiveness, accuracy, safety, and readability. CONCLUSIONS: ChatGPT-4.5 demonstrates superior performance in addressing viral hepatitis queries compared to other three models. Its high reliability makes it a valuable tool for patients and medical professionals not specializing in viral hepatitis by improving information accessibility.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。