Abstract
Sepsis is a life-threatening organ dysfunction syndrome caused by a dysregulated host response to infection. As a leading cause of mortality in intensive care units patients, it still lacks sensitive biomarkers. Therefore, this study aimed to develop a diagnostic model for sepsis and identify key driver biomarkers. Using single-cell RNA sequencing (scRNA-seq) data from the GEO database, we constructed a diagnostic model through 113 machine learning (ML) frameworks, supplemented with Shapley additive explanations (SHAP) analysis to identify pivotal genes. Results revealed a significant increase in myeloid cells, particularly neutrophils, in the peripheral blood of sepsis patients. Screening identified 70 upregulated and 762 downregulated neutrophil-associated genes, which were intersected with differentially expressed genes (DEGs) between healthy controls and sepsis patients, yielding 13 overlapping genes - including S100A12 - as potential drivers. These 13 genes were incorporated into 113 ML models. The Random Forest (RF) model, which included S100A12, PIK3AP1, HLA-DMB, and RETN, achieved the highest mean C-index with fewer features. Its robust diagnostic performance was validated using receiver operator characteristic curves, calibration curves, and decision curve analysis. SHAP analysis highlighted S100A12 as the most influential driver gene and identified theophylline, aspirin, and aminophylline as potential targeting compounds. In conclusion, sepsis patients show increased peripheral neutrophils, an RF model based on 4 neutrophil-associated genes demonstrates strong diagnostic ability, and S100A12 serves as a key biomarker for sepsis.