Abstract
Accurate prediction of lymph node metastasis (LNM) is critical for the staging and treatment planning of gastric cancer (GC). This study aimed to develop and validate a multi-module prediction model that integrates clinicopathological features and hematological biomarkers to enhance the preoperative assessment of GC-LNM. A retrospective analysis was conducted on GC patients treated at a single medical center. Clinical variables were categorized into five modules: basic demographic information, tumor characteristics, inflammation-related indicators, coagulation parameters, and nutritional-immune markers. An XGBoost machine learning model was constructed using 19 selected features, and model interpretability was assessed using SHapley Additive exPlanations (SHAP). Model performance was evaluated using the area under the curve (AUC), sensitivity, and specificity across training (80%) and testing (20%) cohorts. Among 1580 patients included in the analysis, 984 (62.3%) had confirmed LNM. The optimized XGBoost model demonstrated excellent predictive performance, achieving an AUC of 0.883 (95% CI 0.864–0.902) in the training set and 0.815 (95% CI 0.767–0.863) in the testing set. SHAP analysis revealed distinct biomarker contribution patterns across different T-stages, Lauren classifications, and histological differentiation grades. In multivariate logistic regression, T4 stage (odd ratio [OR] = 16.091, P < 0.001) and poorly differentiated tumors (OR = 5.891, P < 0.05) were confirmed as independent risk factors for LNM. This interpretable, multi-module machine learning model offers a robust and convenient tool for predicting LNM in GC, facilitating precise risk stratification and individualized treatment decision-making. The observed heterogeneity in biomarker predictive patterns across pathological subtypes also provides novel insights into metastatic mechanisms and supports the development of personalized therapeutic strategies.