Abstract
Patients with sepsis and concurrent methicillin-resistant Staphylococcus aureus (MRSA) bloodstream infection face a substantial risk of mortality. This study aimed to develop and validate a pragmatic, interpretable prediction model for in-hospital mortality in this high-risk population. We conducted a retrospective, single-center cohort study including 1,605 eligible patients, who were randomly divided into a training set (n = 1,124) and a validation set (n = 481). A rigorous, multi-stage feature selection pipeline—integrating univariate analysis, multi-model machine learning importance assessment, the Boruta algorithm, and stability selection via LASSO—was applied to 64 candidate variables. This process identified seven parsimonious, clinically accessible predictors: Glasgow Coma Scale score, minimum pH, maximum blood urea nitrogen, minimum white blood cell count, minimum platelet count, maximum lactate level, and the presence of pneumonia. Among six compared machine learning algorithms, logistic regression was selected for its optimal balance of performance and inherent interpretability. The final model demonstrated strong discriminative ability, with an area under the receiver operating characteristic curve (AUC) of 0.861 (95% CI: 0.840–0.882) in the training set and 0.844 (95% CI: 0.811–0.877) in the independent validation set, and showed good calibration (Hosmer-Lemeshow test p = 0.274). Decision curve analysis confirmed superior clinical net benefit across a wide range of risk thresholds. The model maintained robust and equitable performance across key patient subgroups and exhibited stability in extensive sensitivity analyses, including multiple imputation and bootstrap internal validation. This interpretable, seven-variable logistic regression model provides a clinically actionable tool for early mortality risk stratification, potentially supporting timely and tailored intervention strategies for sepsis patients with MRSA bacteremia. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-026-12584-4.