Abstract
BACKGROUND: Esophageal cancer is a major cause of cancer mortality, and accurate preoperative T staging guides treatment decisions. Conventional artificial intelligence approaches that rely on single-type data exhibit suboptimal accuracy in T-stage classification. OBJECTIVE: This study aimed to develop and validate a combined deep learning model for T-stage diagnosis by incorporating CT features and clinical variables. METHODS: About 443 EC patients who underwent postoperative pathological evaluation at three centers from 2018 to 2023 were included, with CT images, demographical information, and laboratory test results collected. Based on CT images, the hierarchical multiscale feature fusion network (HMFFN) extracted deep learning features, while three-dimensional reconstruction technology provided handcrafted morphologic features. Additionally, clinical features were obtained from clinical baseline data, laboratory tests, and endoscopic examination results. The auto-metric Graph Neural Network (AMGNN) was combined following the feature extraction module to fuse three types of features for T-stage classification. RESULTS: About 394 patients from internal datasets (mean [SD] age, 61.83 [7.42] years; 320 men [81.22%]) and 49 patients from external datasets (mean [SD] age, 62.84 [7.60] years; 41 men [83.67%]) were evaluated. Our proposed HMFFN-AMGNN model demonstrated excellent performance, achieving AUC of 0.848 (95%CI: 0.788-0.902) and 0.867 (95%CI: 0.792-0.929), as well as accuracy of 72.727% (95%CI: 62.121-83.333) and 77.551% (95%CI: 65.306-87.755) for internal and external test cohorts, respectively. CONCLUSION: The combined deep learning model, integrating CT features with clinical variables, achieved high predictive precision in the diagnosis of EC T-stage, highlighting its potential to facilitate clinical decision-making.