Abstract
BACKGROUND: Heatstroke poses a significant threat to public health, frequently culminating in fatal outcomes. This study aimed to develop and validate an interpretable machine learning (ML) model to forecast heatstroke using clinical and laboratory data. METHODS: Data were collated from 24 hospitals spanning the years 2021 to 2023, with data from 2021 and 2022 comprising the training datasets and data from 2023 designated for validation. Model efficacy was quantified via the area under the receiver operating characteristic curve (AUROC) and calibration plots. Furthermore, the SHapley Additive exPlanations (SHAP) methodology was employed to elucidate the interpretability of the final model. RESULTS: The study encompassed 691 patients, with 176 in the training datasets and 80 in the testing datasets diagnosed with heatstroke. Among the nine ML models assessed, the gradient boosting machine (GBM) model demonstrated superior performance, achieving an AUROC of 0.971 in the training datasets and 0.836 in the testing datasets, and exhibiting substantial net benefits in decision curve analysis. Creatine kinase (CK)-MB was identified as the most impactful variable influencing the GBM model's efficacy. CONCLUSION: The ML model we developed demonstrates robust predictive capabilities for heatstroke, potentially aiding clinicians in the identification and management of patients at elevated risk.