Abstract
BACKGROUND: The prediction of lymph node metastasis in gastric cancer, a pivotal determinant affecting treatment approaches and prognosis, continues to pose a significant challenge in terms of accuracy. METHODS: In this study, we employed a combination of machine learning methods and the SHapley Additive exPlanations (SHAP) framework to develop an integrated predictive model. This model utilizes the preoperatively obtainable parameter of the inflammatory index, aiming to enhance the accuracy of predicting lymph node metastasis in gastric cancer patients. RESULTS: Lymph node metastasis stands as an independent prognostic risk factor for gastric cancer patients. Among various models, XGBoost emerges as the optimal machine learning model. In the training set, the XGBoost model exhibited the highest AUC value of 0.705. In the test set, XGBoost demonstrated the highest AUC of 0.695, and the lowest Brier score of 0.218. Notably, in terms of feature importance, PLR emerged as the most significant factor influencing lymph node metastasis in gastric cancer patients. Through the screening of differentially expressed genes, we ultimately identified the prognostic value of six genes: IGFN1, CLEC11A, STC2, TFEC, MUC5AC, and ANOS1, in predicting survival. CONCLUSION: The XGBoost model can predict lymph node metastasis (LNM) in gastric cancer patients based on the inflammation index and peripheral lymphocyte subgroups. Combined with SHAP, it provides a more intuitive reflection of the impact of different variables on LNM. PLR emerges as the most crucial risk factor for lymph node metastasis in the inflammation index among gastric cancer patients.