Abstract
Evaluation of invasion depth is essential for the treatment strategy of esophageal squamous cell carcinoma (ESCC). However, the application of the Japanese Endoscopic Society classification system, based on the patterns of intravascular papillary cell layer (IPCL) and avascular area (AVA), requires a long-term training for endoscopists. We aimed to develop explainable semi-supervised models for predicting ESCC invasion depth based on the IPCL/AVA patterns. A total of 2,643 images of magnifying endoscopy with narrow-band imaging in the upstream task, self-supervised contrastive learning (n = 2,175), and the downstream task, fine-tuning (n = 468), were from Suzhou. In the fine-tuning, two approaches were adopted: the traditional blackbox or the explainable AI. Lastly, the models were evaluated in an external test dataset (Jintan, n = 60), in comparison with two endoscopists. The primary outcome was 3-way classification of ESCC invasion depth. The metrics included accuracy, Matthew correlation coefficient, and Cohen's kappa. Furthermore, Grad-CAM was for visualized explanation of images; local interpretation, feature importance, and partial dependence plots were conducted for classifiers; and t-SNE was for visualization of feature vectors. A Xception-backboned explainable model (accuracy 0.817) had exhibited better performance than other models and a junior endoscopist (0.733), even though it underperformed a senior (0.883) by 0.066 on accuracy. However, the endoscopists' performance was improved by AI assistance (junior 0.833 and senior 0.917). The explainable semi-supervised framework empowers AI models to achieve improved transparentness and performance, facing the opacity of traditional supervised learning and limited amounts of labelled endoscopic images.