Abstract
BACKGROUND: Currently, endoscopic submucosal dissection (ESD) has become the preferred treatment for superficial oesophageal squamous cell carcinoma (SESCC). However, due to the residual background mucosa, some patients are still at risk of postoperative recurrence. This study aimed to develop and validate an explainable machine learning model to predict recurrence risk in SESCC patients undergoing ESD. METHODS: We enrolled SESCC patients treated with ESD between 1 January 2016 and 31 December 2023. Patients from 2016 to 2021 were allocated to the training set, further split into a 7:3 ratio for model training and internal validation. Patients from 2022 to 2023 constituted the temporal validation set. Seven machine learning algorithms - logistic regression, extreme gradient boosting, light gradient boosting machine, random forest (RF), gradient boosting decision tree, Gaussian Naïve Bayes and Multilayer Perceptron - were employed to construct risk models. Shapley Additive exPlanations (SHAP) method was applied to interpret the optimal model's feature importance. RESULTS: Eight predictors were identified for model construction: platelet-to-lymphocyte ratio, lymphocyte-to-monocyte ratio, low-density lipoprotein level, biological age, mid-oesophageal location, multiple tumour lesions, poor differentiation and infiltration depth (muscularis mucosa or superficial submucosa). Among the seven models, the RF algorithm demonstrated superior performance, achieving an area under the receiver operating characteristic curve of 0.892 in internal validation and 0.761 in temporal validation. An online prediction platform (https://zzhapp.shinyapps.io/shiny/) was developed for clinical application. CONCLUSION: This study successfully developed and validated an explainable RF-based machine learning model to accurately predict recurrence risk in SESCC patients after ESD.