Abstract
Early diagnosis of post-operative delirium (POD) in the older surgical population allows for timely interventions and reduces morbidities. Risk prediction models (RPMs) utilizing machine learning have emerged as promising tools to predict POD, but their performance and applicability in clinical settings remain uncertain. This systematic review evaluates the predictive accuracy and quality of RPMs for POD developed from 2014 to 2024 focusing on patients after non-cardiac surgery. PubMed and EMBASE were systematically searched for studies that developed RPMs predicting POD. Two authors independently screened 298 potential studies for eligibility, and quality assessment was performed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Pooled performance metrics, including AUROC, sensitivity, specificity, and precision, were calculated. Twenty-two articles matched review criteria, with the majority employing machine learning techniques such as gradient boosting and random forests. The pooled AUROC was 0.82 (95% CI: 0.79-0.85), indicating moderate-to-high predictive accuracy. Sensitivity, specificity, and precision were 0.78, 0.83, and 0.55, respectively. Studies utilizing more predictors and complex model architectures did not show substantial increases in performance compared to simpler models developed pre-2014. We demonstrated that while newer RPMs for POD are more likely to be validated and utilize advanced machine learning algorithms, their interpretability and clinical applicability remain limited. ML models hold promise in reducing the incidence of POD, but significant effort is needed to facilitate the integration of these models into clinical practice. Future efforts should focus on validating models externally, reducing false positive predictions, and translating model predictions into clinical actions.