Exploring the association between complete blood cell count-derived inflammatory biomarkers and cancer incidence through interpretable machine learning models: A study based on NHANES 1999 to 2016

利用可解释的机器学习模型探索全血细胞计数衍生的炎症生物标志物与癌症发病率之间的关联:一项基于1999年至2016年NHANES数据的研究

阅读:2

Abstract

This study utilized interpretable machine learning approaches to investigate associations between complete blood cell count (CBC)-derived inflammatory parameters and cancer occurrence. Data from 35,591 National Health and Nutrition Examination Survey participants collected between 1999 and 2016 were analyzed. Four CBC-derived inflammatory biomarkers were examined: systemic immune-inflammation index (SIRI), neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), and neutrophil-monocyte-lymphocyte ratio (NMLR). The analytical framework incorporated weighted multivariable logistic regression, restricted cubic spline modeling, threshold effect evaluation, and subgroup analyses. Machine learning assessment utilized 8 different algorithms, with the Boruta method for feature selection and Shapley additive explanations (SHAP) methodology for enhancing model interpretability. Participants with cancer exhibited significantly elevated levels of all inflammatory markers in comparison to control subjects. Multivariable logistic regression revealed that participants in the highest quartile demonstrated significantly increased cancer risk relative to the lowest quartile: SIRI (OR = 1.32, P = .023), NLR (OR = 1.38, P = .002), MLR (OR = 1.26, P = .041), and NMLR (OR = 1.44, P = .001). Each biomarker exhibited significant positive dose-response trends (all P ≤ .05). Restricted cubic spline analyses highlighted nonlinear associations with inflection points appearing at 2.084, 4.308, 0.145, and 4.739 for SIRI, NLR, MLR, and NMLR, respectively. The random forest model demonstrated optimal performance (AUC = 0.765), with SHAP analysis pinpointing age as the strongest predictor, followed by MLR, lymphocyte count, and NMLR. CBC-derived inflammatory biomarkers show notable nonlinear associations with cancer prevalence. Machine learning models that are interpretable reveal complex relationships that extend beyond traditional statistical methods, indicating that these readily available biomarkers could improve cancer risk stratification and early detection strategies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。