Abstract
This study aimed to utilize various machine learning algorithms to develop a predictive model for the progression of severe community-acquired pneumonia (SCAP) in children to critical severe community-acquired pneumonia (cSCAP). Retrospective analysis of clinical data of SCAP patients admitted to the Department of Pediatric Intensive Care Medicine at the First Affiliated Hospital of Bengbu Medical University from January 2021 to April 2023. Logistic regression (LR) and Least Absolute Shrinkage and Selection Operator (LASSO) were jointly employed to screen model variables. The selected variables were then incorporated into seven algorithms, namely LR, Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Naive Bayes (NB), k-Nearest Neighbor (KNN), and Support Vector Machine (SVM), to establish a predictive model for the progression of SCAP in children to a critically severe stage. The effectiveness of the model was evaluated based on the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. Finally, the Shapley Additive Explanation (SHAP) algorithm was used to interpret the established machine learning model. A total of 211 patients were included. Red Cell Distribution Width-Coefficient of Variation (RDW-CV), procalcitonin (PCT), blood urea nitrogen (BUN), and lactate dehydrogenase (LDH) were selected as predictors. The XGBoost model outperformed six other algorithms, with an AUC of 0.98 (95% CI,0.93-1.00 ), accuracy 0.89 (95% CI, 0.78-0.94), sensitivity 0.98 (95% CI, 0.95-1.00), and specificity 0.75 (95% CI, 0.45-0.87). SHAP analysis identified PCT, LDH, RDW-CV, and BUN as the most important contributors, supporting their clinical relevance for early risk stratification. This study developed an accurate predictive model for the cSCAP in children using machine learning techniques, providing clinical support for decision-making by clinicians.