Abstract
Neoadjuvant immunochemotherapy (nICT) has significantly improved the treatment of locally advanced esophageal cancer (EC), yet accurately identifying patients' response remains a major challenge. In this study, we introduce eSPARK, a multimodal framework designed to integrate routinely available clinical data for informed decision-making in nICT treatment for EC. The model is developed using 344 patients from three independent regions, each with pre-treatment-paired computed tomography (CT) imaging and pathological slides, and postoperative pathological complete response (pCR) outcomes. By incorporating cytological semantic information, eSPARK demonstrates superior generalizability, outperforming single-modality models and achieving robust predictive accuracy across multicenter datasets. Additionally, a multi-scale interpretability module identifies several biomarkers, including the neutrophil-to-lymphocyte ratio (NLR) in the tumor microenvironment, associated with nICT response. Our findings underscore the potential of eSPARK as a powerful tool for personalized therapeutic decision-making in locally advanced EC and its broader implications for advancing precision oncology through multidisciplinary data integration.