Abstract
Artificial intelligence (AI) chatbots have emerged as promising tools for enhancing medical communication, yet their efficacy in interpreting complex radiological reports remains underexplored. This study evaluates the performance of AI chatbots in translating magnetic resonance imaging (MRI) reports into patient-friendly language and providing clinical recommendations. A cross-sectional analysis was conducted on 6174 MRI reports from tumor patients across three hospitals. Two AI chatbots, GPT o1-preview (Chatbot 1) and Deepseek-R1 (Chatbot 2), were tasked with interpreting reports, classifying tumor characteristics, assessing surgical necessity, and suggesting treatments. Readability was measured using Flesch-Kincaid and Gunning Fog metrics, while accuracy was evaluated by medical reviewers. Statistical analyses included Friedman and Wilcoxon signed-rank tests. Both chatbots significantly improved readability, with Chatbot 2 achieving higher Flesch-Kincaid Reading Ease scores (median: 58.70 vs. 46.00, p < 0.001) and lower text complexity. Chatbot 2 outperformed Chatbot 1 in diagnostic accuracy (92.05% vs. 89.03% for tumor classification; 95.12% vs. 84.73% for surgical necessity, p < 0.001). Treatment recommendations from Chatbot 2 were more clinically relevant (98.10% acceptable vs. 75.41%), though both demonstrated high empathy (92.82-96.11%). Errors included misinterpretations of medical terminology and occasional hallucinations. AI chatbots, particularly Deepseek-R1, effectively enhance the readability and accuracy of MRI report interpretations for patients. However, physician oversight remains critical to mitigate errors. These tools hold potential to reduce healthcare burdens but require further refinement for clinical integration.