Abstract
With the increasing frequency and intensity of heatwaves driven by climate change, heatstroke has emerged as a growing public health concern. As the most severe form of heat-related illness, heatstroke is frequently complicated by acute kidney injury (AKI), a major contributor to poor prognosis. Although AKI often develops in later stages, early detection is essential to reduce morbidity and mortality. This study aimed to develop and validate machine learning models to predict AKI using clinical data from the first 24 h of hospitalization, enabling timely intervention and improved outcomes. We retrospectively collected data from 290 heatstroke patients admitted to 55 hospitals in China between 2008 and 2024. Variables included demographics, clinical features, comorbidities, vital signs, laboratory results, treatments, and complications. Data from the first 24 h of hospitalization were analyzed using univariate analysis, ROC curves, and collinearity testing to identify key predictors. These variables were used to build logistic regression and five machine learning models (Naive Bayes, decision tree, kNN, SVM, and XGBoost), with 20-fold cross-validation applied to reduce overfitting. The cohort was predominantly male (90.69%) with a median age of 25 [21, 41] years, and AKI occurred in 57.93% of patients. Within the first 24 h of hospitalization, the AKI group showed significantly higher core temperatures and heart rates compared to the non-AKI group. They also exhibited elevated renal function markers, coagulation and inflammatory indicators, as well as more pronounced liver dysfunction and rhabdomyolysis. Logistic regression and five machine learning algorithms were applied to predict AKI occurrence using early clinical data. Among them, the kNN model achieved the best performance (AUC = 0.934 [0.909, 0.959]), with troponin T (TnT), D-dimer, myoglobin (Mb), and hematocrit (HCT), identified as key predictive features. Based on clinical data from the first 24 h of hospitalization, the kNN model demonstrated the highest predictive performance for identifying heatstroke patients at risk of a rapid rise in serum creatinine or oliguria during hospitalization. TnT, D-dimer, Mb, and HCT were identified as key predictive variables.