Abstract
BACKGROUND: The number of risk prediction models for postoperative delirium(POD) has increased yearly, but their quality and applicability in clinical practice and future research remain unclear. AIMS: This systematic review aimed to evaluate published studies on POD risk prediction models, evaluate the diagnostic accuracy of these models, and provide guidance for model establishment and improvement. METHODS: We conducted a systematic search in PubMed, Embase, the Cochrane Library, and other scientific databases to identify eligible studies published up to February 15, 2025. The included studies provided data on the sensitivity and specificity of predictive models. RESULTS: We included 17 articles with 69 machine learning (ML) prediction models, encompassing 205,202 individuals who underwent surgerg, among whom 50,899 developed POD (24.8%). The average number of cases per study was 12,070, with the largest study including 163,436 cases. The combined area under the receiver operating characteristic curve (AUROC) for predicting POD, calculated by averaging the AUROC values reported in the studies, was 0.83 [95% CI, 0.79-0.86], with a sensitivity of 0.73 [95% CI, 0.68-0.74] and a specificity of 0.79 [95% CI, 0.74-0.82]. Subgroup analysis revealed that the random forest model exhibited the highest pooled AUROC, reaching 0.89 [95% CI, 0.86-0.92]. Notably, models predicting POD for orthopedic surgeries (AUROC: 0.88 [95% CI, 0.85-0.90]) and patients younger than 60 years (AUROC: 0.84 [95% CI, 0.81-0.87]) demonstrated superior predictive performance. Furthermore, models that underwent only internal validation had lower predictive performance compared to those validated both internally and externally (AUROC: 0.84 [95% CI, 0.81-0.87]). Compared to models developed for European and American populations, those applied to Asian populations performed better (AUROC: 0.85 [95% CI, 0.82-0.88]). Across the included studies, advanced age, preoperative cognitive impairment, comorbidities, anemia, and hypoalbuminemia were the most consistently reported predictors of POD . CONCLUSIONS: ML models perform well in predicting the occurrence of POD, with stable performance across various surgical and demographic subgroups.