Performance of the ChatGPT-5 Language Model in Solving a Specialty Examination in Balneology and Physical Medicine

ChatGPT-5语言模型在解决温泉疗法和物理医学专业考试中的表现

阅读：1

作者：Loson-Kawalec,Michalina,Kowalczyk,Anna,Szymanski,Dawid,Dadynska,Patrycja,Tabor,Aleksander,Bartosik,Dawid,Zerek,Marta,Sitarek,Gracjan,Starzynski,Bartosz,Keska,Alina,Cwikla,Bartlomiej,Sawina,Piotr,Dolata,Tomasz,Pielech,Adrianna,Majchrzak,Maciej,Podkanowicz,Mateusz

期刊：	Cureus Journal of Medical Science	影响因子：	1.300
时间：	2025	起止号：	2025 Nov;17(11):e96885
doi：	10.7759/cureus.96885

Abstract

Background In recent years, there has been a breakthrough in the development of advanced computational systems based on neural networks. One such system is ChatGPT, first released in 2018, whose potential was quickly recognized, leading to its global popularity. Language models are increasingly capable of addressing complex problems, making them a promising tool to support the training of medical professionals. A particularly important aspect is AI's ability to solve medical examinations, such as the Medical Final Examination (LEK) and the National Specialty Examination (PES), as well as international exams, including the United States Medical Licensing Examination and various specialty board examinations. Objective The objective of this study is to analyze the potential of the latest publicly available version of the ChatGPT-5 model in addressing examination questions in balneology and physical medicine as part of the PES. The study focuses on analyzing the accuracy of the model's answers and evaluating the confidence of its decisions to assess its potential use as a supportive tool in medical education and specialty exam preparation. Materials and methods The experiment was based on the official Spring 2024 PES in Balneology and Physical Medicine, which consisted of 120 questions. The correctness of ChatGPT-5's answers was verified against the official key prepared by the Center for Medical Examinations (CEM), while also recording the model's self-declared confidence level on a 1-5 scale. Both the answer key and the examination database were obtained from the official CEM website. Prior to testing, ChatGPT-5 was introduced to the rules of the examination and provided with the full set of questions in Polish. The questions were divided into two groups: clinical and theoretical. Two questions were excluded due to inconsistency with current medical knowledge. Statistical analyses, including the chi-square test and the Mann-Whitney U test, were performed using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) and GraphPad Prism (GraphPad Software, San Diego, CA, USA). Results ChatGPT-5 provided 83 correct answers (70.34%), thereby surpassing the passing threshold. No statistically significant differences were observed between clinical and theoretical questions in terms of answer accuracy (p = 0.983), suggesting that the discrepancies were more likely attributable to random variation rather than true differences. Answer correctness was positively correlated with the model's self-assessed confidence level (p = 0.029): the higher the declared confidence, the greater the likelihood of a correct response. The Mann-Whitney U test (p = 0.07) indicated that the difference in confidence levels between clinical and theoretical questions did not reach statistical significance (α = 0.05), although a trend toward potential differences was observed. Conclusions ChatGPT-5 demonstrated sufficient performance to pass the specialization examination in Balneology and Physical Medicine. The model displayed lower confidence in solving advanced clinical questions compared to theoretical ones. Answer accuracy was correlated with the assigned confidence level. While the Mann-Whitney U test (p = 0.07) did not confirm statistically significant differences in confidence between the two categories of questions, it suggested a possible trend. Further expert research is required before such models can be widely implemented in medical education.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

炎性小体

组蛋白修饰

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

细胞极性

铁代谢

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

琥珀酰化

细胞干性

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

MAIT 细胞

肠肝轴

丙酰化