Abstract
This study utilized interpretable machine learning approaches to investigate associations between complete blood cell count (CBC)-derived inflammatory parameters and cancer occurrence. Data from 35,591 National Health and Nutrition Examination Survey participants collected between 1999 and 2016 were analyzed. Four CBC-derived inflammatory biomarkers were examined: systemic immune-inflammation index (SIRI), neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), and neutrophil-monocyte-lymphocyte ratio (NMLR). The analytical framework incorporated weighted multivariable logistic regression, restricted cubic spline modeling, threshold effect evaluation, and subgroup analyses. Machine learning assessment utilized 8 different algorithms, with the Boruta method for feature selection and Shapley additive explanations (SHAP) methodology for enhancing model interpretability. Participants with cancer exhibited significantly elevated levels of all inflammatory markers in comparison to control subjects. Multivariable logistic regression revealed that participants in the highest quartile demonstrated significantly increased cancer risk relative to the lowest quartile: SIRI (OR = 1.32, P = .023), NLR (OR = 1.38, P = .002), MLR (OR = 1.26, P = .041), and NMLR (OR = 1.44, P = .001). Each biomarker exhibited significant positive dose-response trends (all P ≤ .05). Restricted cubic spline analyses highlighted nonlinear associations with inflection points appearing at 2.084, 4.308, 0.145, and 4.739 for SIRI, NLR, MLR, and NMLR, respectively. The random forest model demonstrated optimal performance (AUC = 0.765), with SHAP analysis pinpointing age as the strongest predictor, followed by MLR, lymphocyte count, and NMLR. CBC-derived inflammatory biomarkers show notable nonlinear associations with cancer prevalence. Machine learning models that are interpretable reveal complex relationships that extend beyond traditional statistical methods, indicating that these readily available biomarkers could improve cancer risk stratification and early detection strategies.