Abstract
Background: Gastric cancer (GC) remains a major global health challenge, with rising incidence among patients post-Helicobacter pylori (H. pylori) eradication, particularly those with persistent intestinal metaplasia (IM). Current risk stratification tools are limited in this high-risk population. Aim: To develop, validate, and externally test a machine learning-based prediction model-termed the Early Gastric Cancer Model (EGCM)-for identifying early gastric cancer (EGC) risk in H. pylori-eradicated patients with IM, and to implement it as a web-based clinical tool. Methods: This retrospective, dual-center study enrolled 214 H. pylori-eradicated patients with histologically confirmed IM from 900 Hospital and Fujian Provincial People's Hospital. The dataset was split into a training cohort (70%) and an internal validation cohort (30%), with an external test cohort from the second center. A total of 21 machine learning algorithms were screened using cross-validation and hyperparameter optimization. Boruta and SHAP analyses were employed for feature selection, and the final EGCM was constructed using the top five predictors: atrophy range, xanthoma, map-like redness (MLR), MLR range, and age. Model performance was evaluated via ROC curves, precision-recall curves, calibration plots, and decision curve analysis (DCA), and compared against conventional inflammatory biomarkers such as NLR and PLR. Results: The CatBoost algorithm demonstrated the best overall performance, achieving an AUC of 0.743 (95% CI: 0.70-0.80) in internal validation and 0.905 in the external test set. The EGCM exhibited superior discrimination compared to individual inflammatory markers (p < 0.01). Calibration analysis confirmed strong agreement between predicted and observed outcomes. DCA showed the EGCM yielded greater net clinical benefit. A web calculator was developed to facilitate clinical application. Conclusions: The EGCM is a validated, interpretable, and practical tool for stratifying EGC risk in H. pylori-eradicated IM patients across multiple centers. Its integration into clinical practice could improve surveillance precision and early cancer detection.