Abstract
U-structure has become a foundational approach in medical image segmentation, consistently demonstrating strong performance across various segmentation tasks. Most current models are based on this framework, customizing encoder-decoder components to achieve higher accuracy across various segmentation challenges. However, this often comes at the cost of increased parameter counts, which inevitably limit their practicality in real-world applications. In this study, we provide an E-shaped segmentation framework that discards the traditional step-by-step resolution recovery decoding process, instead directly aggregating multi-scale features extracted by the encoder at each stage for deep cross-level integration. Additionally, we propose an innovative multi-scale large-kernel convolution (MLKConv) module, designed to enhance high-level feature representation by effectively capturing both local and global contextual information. Compared to U-structure, the proposed E-structured approach substantially reduces parameters while delivering superior performance, especially in complex segmentation tasks. Based on this structure, we develop 2 segmentation networks specifically for 2-dimensional (2D) and 3D medical images. 2D E-SegNet is evaluated on four 2D segmentation benchmark datasets (Synapse multi-organ, ACDC, Kvasir-Seg, and BUSI), while 3D E-SegNet is assessed on four 3D segmentation benchmark datasets (Synapse, ACDC, NIH Pancreas, and Lung). Experimental results demonstrate that our approach outperforms the current leading U-shaped models across multiple datasets, achieving new state-of-the-art (SOTA) performance with fewer parameters. In summary, our research introduces a novel approach to medical image segmentation, offering potential improvements and contributing to ongoing advancements in the field. Our code is publicly available on https://github.com/zhaoqi106/E-SegNet.