Abstract
BACKGROUND AND OBJECTIVE: Generative artificial intelligence (AI) tools such as ChatGPT are increasingly integrated into healthcare, with potential to support clinical decision-making and improve patient outcomes. In palliative care, where access to multidisciplinary expertise is often limited, these tools may provide support for symptom management. This study aimed to systematically compare ChatGPT-4o and ChatGPT-5 for common palliative care symptoms across four key domains: clinical appropriateness, safety, ethical sensitivity, and understandability. METHODS: Clinical scenarios representing 10 key symptoms (pain, anxiety, pressure ulcer, nausea, delirium, dyspnea, constipation, diarrhea, dry mouth, and sleep disturbance) were presented first to ChatGPT-4o and, 1 week later, to ChatGPT-5. Responses were evaluated independently by two physicians using a five-point Likert scale. Inter-rater agreement was analyzed with weighted Cohen's kappa and Spearman's correlation. The statistical analyses in this study were conducted using the Friedman test, Mann-Whitney U test, and Wilcoxon signed-rank test. RESULTS: Inter-rater agreement was consistently high across all domains (kappa 0.806-0.886, Spearman's rho 0.813-0.888; all p < 0.001). ChatGPT-5 outperformed ChatGPT-4o in clinical appropriateness (p = 0.010), safety (p = 0.002), and understandability (p = 0.011). Ethical sensitivity scores were high for both models, with no significant difference (p = 0.102). CONCLUSIONS: ChatGPT-5 demonstrated measurable improvements over ChatGPT-4o in key domains of palliative care symptom management, while maintaining consistently high ethical sensitivity. These findings provide the first systematic evidence of the potential of generative AI, with the updated ChatGPT-5 model released in August 2025, as a complementary and reliable clinical decision support tool in palliative care.