Abstract
Background: The rapid advancement of radiomics and artificial intelligence (AI) technology has provided novel tools for the diagnosis of esophageal cancer. This study innovatively combines muscle imaging features with conventional esophageal imaging features to construct deep learning diagnostic models. Methods: This retrospective study included 1066 patients undergoing radical esophagectomy. Preoperative computed tomography (CT) images covering esophageal, stomach, and muscle (bilateral iliopsoas and erector spinae) regions were segmented automatically with manual adjustments. Diagnostic models were developed using deep learning (2D and 3D neural networks) and traditional machine learning (11 algorithms with PyRadiomics-derived features). Multimodal features underwent Principal Component Analysis (PCA) for dimension reduction and were fused for final analysis. Results: Comparative analysis of 1066 patients' CT imaging revealed the muscle-based model outperformed the esophageal plus stomach model in predicting N2 staging (0.63 ± 0.11 vs. 0.52 ± 0.11, p = 0.03). Subsequently, multimodal fusion models were established for predicting pathological subtypes, T staging, and N staging. The logistic regression (LR) fusion model showed optimal performance in predicting pathological subtypes, achieving accuracy (ACC) of 0.919 in the training set and 0.884 in the validation set. For predicting T staging, the support vector machine (SVM) model demonstrated the highest accuracy, with training and validation accuracies of 0.909 and 0.907, respectively. The multilayer perceptron (MLP) fusion model achieved the best performance among all models tested for N staging prediction, although the accuracy remained moderate (ACC = 0.704 in the training set and 0.685 in the validation set), indicating potential for further optimization. Fusion models significantly outperformed single-modality models. Conclusions: Based on CT imaging data from 1066 patients, this study systematically constructed predictive models for pathological subtypes, T staging, and N staging of esophageal cancer. Comparative analysis of models using esophageal, esophageal plus stomach, and muscle modalities demonstrated that muscle imaging features contribute to diagnostic accuracy. Multimodal fusion models consistently showed superior performance.