Abstract
BACKGROUND: Early triage for massive transfusion (MT) is essential in trauma care but most existing scoring systems rely on in-hospital data. To address this limitation, a machine learning model using only prehospital variables to predict MT and stratify mortality risk was developed and externally validated. METHODS: Data from the Korean Trauma Data Bank from 19 trauma centres (2017-22) was used for model development and internal validation, with 2023 data for patients from four additional centres used for external validation. Trauma cases were identified using S or T codes from the Korean Classification of Diseases, 7th edition. MT was defined as ≥ 5 units packed red blood cells within 4 hours or ≥ 10 units within 24 hours. Machine learning models were trained using 21 prehospital variables, with a final ensemble model constructed from the top-performing algorithms. Model interpretability was assessed using Shapley additive explanations (SHAP), and the association between predicted probability tertiles (T1-T3) and in-hospital mortality was evaluated using logistic regression. RESULTS: In all, 227 567 patients were included in the development cohort and internal validation cohort, with 8867 patients in the external validation cohort. The soft-voting ensemble model, combining random forest and AdaBoost, showed high predictive performance, with area under the receiver operating characteristic curve values of 0.837 (internal validation) and 0.837 (external validation). SHAP analysis identified accident type as the most influential predictor, followed by consciousness level, and circulatory assistance. Higher model probability was associated with increased in-hospital mortality (adjusted odds ratios (95% confidence intervals) 2.34 (2.16 to 2.55), 2.70 (2.49 to 2.92), and 3.53 (3.25 to 3.83) for T1, T2, and T3, respectively). CONCLUSION: A prehospital ensemble learning model to predict MT was developed and validated, and its predictions were significantly associated with in-hospital mortality. However, this study is limited by the inclusion of a single ethnicity, and future research needs to integrate data from multiple populations to enhance generalizability.