Abstract
Background/Objectives: Machine learning (ML) models can predict hospital admission from emergency department (ED) triage data with areas under the receiver operating characteristic curve (AUC) exceeding 0.85. Whether incorporating the assigned provider's identity-as a proxy for unmeasured practice variation-improves prediction has not been systematically studied. We aimed to compare 10 supervised ML classifiers for predicting hospital admission at ED triage, with and without provider identity, and to characterize model reasoning using SHapley Additive exPlanations (SHAP). Methods: We conducted a retrospective cohort study of 186,094 ED visits (2020-2023, training) and 58,151 visits (2024, temporal holdout test) at one academic tertiary-care ED. Ten classifiers spanning linear, distance-based, tree-based, ensemble, probabilistic, and neural network families were each trained in two conditions: baseline (23 triage features) and with provider identity appended. SHAP TreeExplainer was applied to the top-performing models (CatBoost and XGBoost). Results: The admission rate was 31.3% (training) and 31.7% (test). CatBoost achieved the highest baseline AUC of 0.8906 (0.8878-0.8933). Adding provider identity produced negligible AUC changes across all models (ΔAUC range: -0.0029 to +0.0015; all DeLong p > 0.05). SHAP analysis identified ESI level, respiratory rate, temperature, complaint category, and age as the dominant predictors, with clinically intuitive directionality. Conclusions: Provider identity does not meaningfully improve ML prediction of hospital admission beyond standard triage variables. The observed 28-percentage-point variation in provider admission rates is explained by patient case-mix differences than with independent practice pattern effects on prediction. SHAP provides transparent, clinically interpretable explanations suitable for bedside decision support.