Abstract
BACKGROUND: Invasive Klebsiella pneumoniae liver abscess syndrome (IKLAS) increases the risk of mortality and length of hospital stay in patients with pyogenic liver abscess (PLA). This study aimed to construct a nomogram capable of accurately predicting the occurrence of IKLAS in PLA patients. METHODS: This study retrospectively analyzed data from pyogenic liver abscess (PLA) patients admitted to Tianjin Medical University General Hospital between January 2022 and May 2024. We aimed to develop a nomogram predicting IKLAS, which defines as a liver abscess with metastatic Klebsiella pneumoniae infections at other sites. Data were collected from the inpatient management system. To address class imbalance, the dataset was augmented using Synthetic Minority Over-sampling Technique (SMOTE) and Random Over-Sampling Examples (ROSE). The enhanced dataset was split into a training set and validation set using R software. To improve the model’s explanatory power and stability, Least Absolute Shrinkage and Selection Operator (LASSO) regression was used. A logistic regression-based nomogram was developed. It was evaluated using receiver operating characteristic (ROC) curves, calibration diagrams, and decision curve analysis (DCA). Internal validation was performed on the holdout dataset. RESULTS: This study included 160 PLA patients, 24 with IKLAS and 136 with non-IKLAS. Using SMOTE and ROSE, IKLAS cases were augmented to 120. Patients were divided into training (n = 180) and validation (n = 76) sets. Statistical analysis identified 12 significant factors, such as diabetes mellitus (DM), septic shock (SS), white blood cell (WBC), neutrophil percentage (N%), lymphocyte percentage (L%), N%: L% ratio, C-reactive protein (CRP), alkaline phosphatase (ALKP), blood urea nitrogen (BUN), total cholesterol (TC), and lipoproteins (HDL, LDL). LASSO regression selected eight parameters for nomogram: SS, WBC, L%, CRP, ALKP, BUN, and LDL. TC was excluded due to its compositional relationship with LDL. The model demonstrated excellent predictive performance. It achieved an AUC of 0.813 (95% CI: 0.752–0.834) in the training set and 0.853 (95% CI: 0.769–0.937) in the validation set. Calibration curves and DCA confirmed accuracy and strong clinical utility, proving its reliability for predicting IKLAS. CONCLUSION: We developed a risk prediction model for IKLAS in patients with PLA. The influencing factors include SS, WBC, L%, CRP, ALKP, BUN and LDL. This model provides preliminary evidence for IKLAS risk stratification. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-026-12727-7.