Comparison of ChatGPT and DeepSeek large language models in the diagnosis of pericarditis

ChatGPT 和 DeepSeek 大型语言模型在心包炎诊断中的比较

阅读:1

Abstract

BACKGROUND: The integration of sophisticated large language models (LLMs) into healthcare has recently garnered significant attention due to their ability to leverage deep learning techniques to process vast datasets and generate contextually accurate, human-like responses. These models have been previously applied in medical diagnostics, such as in the evaluation of oral lesions. Given the high rate of missed diagnoses in pericarditis, LLMs may support clinicians in generating differential diagnoses-particularly in atypical cases where risk stratification and early identification are critical to preventing serious complications such as constrictive pericarditis and pericardial tamponade. AIM: To compare the accuracy of LLMs in assisting the diagnosis of pericarditis as risk stratification tools. METHODS: A PubMed search was conducted using the keyword "pericarditis", applying filters for "case reports". Data from relevant cases were extracted. Inclusion criteria consisted of English-language reports involving patients aged 18 years or older with a confirmed diagnosis of acute pericarditis. The diagnostic capabilities of ChatGPT o1 and DeepThink-R1 were assessed by evaluating whether pericarditis was included in the top three differential diagnoses and as the sole provisional diagnosis. Each case was classified as either "yes" or "no" for inclusion. RESULTS: From the initial search, 220 studies were identified, of which 16 case reports met the inclusion criteria. In assessing risk stratification for acute pericarditis, ChatGPT o1 correctly identified the condition in 10 of 16 cases (62.5%) in the differential diagnosis and in 8 of 16 cases (50.0%) as the provisional diagnosis. DeepThink-R1 identified it in 8 of 16 cases (50.0%) and 6 of 16 cases (37.5%), respectively. ChatGPT o1 demonstrated higher accuracy than DeepThink-R1 in identifying pericarditis. CONCLUSION: Further research with larger sample sizes and optimized prompt engineering is warranted to improve diagnostic accuracy, particularly in atypical presentations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。