Abstract
BACKGROUND: Postoperative intracranial infection is a critical complication strongly associated with poor prognosis in brain tumor patients. This study aimed to develop and validate machine learning (ML) models for predicting intracranial infection using readily accessible postoperative cerebrospinal fluid (CSF) parameters. METHOD: We retrospectively analyzed 657 brain tumor patients, with an independent cohort (n = 116) for external validation. Key predictors were identified through feature selection via LASSO regression combined with random forest. Eleven ML models were trained (70% data) with hyperparameter optimization via 10-fold cross-validation and bootstrap-based comparison, followed by evaluation on internal test set (30%) and external validation set. RESULTS: CSF polymorphonuclear cell percentage (PMN%), glucose (GLU) level and color were identified as the most significant predictors of postoperative intracranial infection. Among the tested models, the Gradient Boosting Decision Tree (GBDT) exhibited the strongest predictive performance, achieving AUC values of 0.98 (training set), 0.94 (internal validation), and 0.91 (external validation). The model also demonstrated excellent calibration, robust precision-recall discrimination, and meaningful clinical utility, as confirmed by decision curve analysis (DCA). Shapley Additive exPlanations (SHAP) interpretability analysis further validated PMN% as the most influential predictor. Subgroup analyses indicated that the model maintained robust performance in most key clinical subgroups, though some variability was observed in patients with brain metastasis. To facilitate clinical application, we developed a user-friendly, web-based calculator for estimating individualized infection risk in brain tumor patients. CONCLUSION: The GBDT-based model enables accurate prediction of postoperative intracranial infection by leveraging readily available CSF parameters (PMN%, GLU, color). Characterized by rapidity, objectivity, and interpretability, it facilitates early risk stratification and personalized clinical intervention. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12935-026-04242-1.