Abstract
Accurate segmentation of esophageal cancers in CT images is crucial for disease treatment planning but remains difficult due to variable tumor morphology, low contrast with surrounding tissues, and blurred boundaries. We propose MDPNet, a Multi-scale Difference Perception Network for accurate esophageal cancer segmentation in CT images. MDPNet integrates three key modules, a Dynamic Feature Enhancement (DFE) strategy for global and local context fusion, a Cross-level Difference Modeling (CDM) module to highlight foreground-background differences, and a Multi-stage Foreground Enhancement (MFE) mechanism for progressive boundary refinement. Experiments on the self-built ECD 2D dataset and an external test set show that MDPNet achieves the best performance among state-of-the-art methods, with Dice coefficients of 0.82 and 0.78, respectively. MDPNet effectively improves segmentation accuracy and generalization, demonstrating preliminary generalization capability on our multi-center test sets, suggesting its potential as a decision-support tool.