Abstract
IMPORTANCE: Validation of prognostic tools is an essential component of their integration into clinical practice. OBJECTIVE: To validate 4 delirium risk stratification tools in an independent emergency department (ED) cohort. DESIGN, SETTING, AND PARTICIPANTS: This prognostic study included a retrospective cohort of patients 65 years or older who presented to an academic ED between January 1, 2021, and December 31, 2024. Data were analyzed from February 10 to August 21, 2025. EXPOSURE: Delirium screening included the Delirium Triage Screen followed by the brief Confusional Assessment Method as part of the standard of care. Four delirium risk scores were obtained, including the Kennedy rule, the Zucchelli rule, the Mayo Delirium Prediction (MDP) tools, and the Recognizing Delirium in Emergency Medicine (REDEEM) score. MAIN OUTCOMES AND MEASURES: Main outcomes included risk score performances and diagnostic test accuracy measured using the area under the receiver operating characteristics curve (AUROC), sensitivity, specificity, negative (NPV) and positive (PPV) predictive values, and positive and negative likelihood ratios (LRs) with 95% CIs. Secondary outcomes were discriminative and calibration measures. RESULTS: A total of 44 578 patients were included, of whom 1701 (3.8%) were diagnosed with delirium. The median age was 80.0 (IQR, 75.0-85.0) years, and 22 786 patients (51.1%) were female. The Kennedy rule had an AUROC of 0.777 (95% CI, 0.766-0.789). At a cutoff of 5, sensitivity was 0.55 (95% CI, 0.52-0.57); specificity, 0.85 (95% CI, 0.84-0.85); positive LR, 3.54 (95% CI, 3.37-3.72); negative LR, 0.54 (95% CI, 0.51-0.57); PPV, 0.12 (95% CI, 0.12-0.13); and NPV, 0.98 (95% CI, 0.98-0.98). The Zucchelli tool had an AUROC of 0.701 (95% CI, 0.686-0.713). At a cutoff of 5, sensitivity was 0.68 (95% CI, 0.66-0.70); specificity, 0.66 (95% CI, 0.65-0.66); positive LR, 1.99 (95% CI, 1.92-2.06); negative LR, 0.49 (95% CI, 0.45-0.52); PPV, 0.07 (95% CI, 0.07-0.08); and NPV, 0.98 (95% CI, 0.98-0.98). The MDP tool had an AUROC of 0.898 (95% CI, 0.891-0.905). At a 30% cutoff, sensitivity was 0.51 (95% CI, 0.48-0.53); specificity, 0.95 (95% CI, 0.95-0.95); positive LR, 9.69 (95% CI, 9.11-10.31); negative LR, 0.52 (95% CI, 0.50-0.55); PPV, 0.28 (95% CI, 0.26-0.29); and NPV, 0.98 (95% CI, 0.98-0.98). The REDEEM score demonstrated the highest AUROC at 0.921 (95% CI, 0.914-0.929). Using a cutoff of 11 or greater, sensitivity was 0.83 (95% CI, 0.81-0.85); specificity, 0.92 (95% CI, 0.91-0.92), negative LR, 0.18 (95% CI, 0.17-0.20); positive LR, 9.91 (95% CI, 9.54-10.29); PPV, 0.28 (95% CI, 0.27-0.29); and NPV, 0.99 (95% CI, 0.99-0.99). CONCLUSIONS AND RELEVANCE: In this prognostic study comparing 4 delirium tools to improve delirium detection and identify high-risk patients, the REDEEM score and the MDP tool had better performance for delirium detection in the ED. These findings lay the groundwork for integrating validated risk stratification into ED workflows to improve early delirium detection and inform prevention strategies for high-risk older adults.