Abstract
The encoder-decoder paradigm has emerged as the prevailing framework in medical image segmentation, and recent studies within this paradigm have demonstrated its remarkable effectiveness for lesion delineation. However, because the encoder compresses high-dimensional inputs and the decoder must reconstruct the target from the encoder's limited latent representation, a fixed encoder-decoder pipeline inevitably introduces a semantic gap between the two stages. To bridge this gap, we present MAFormer, a novel U-shaped network tailored for medical image segmentation. Specifically, we design a Multi-scale Dependency Feature Construction (MDFC) module that refines the skip-connection pathway to fuse semantic information across hierarchical levels. In addition, we propose an Attention Representation Reinforcement Module (ARRM) that strengthens encoder-decoder semantic alignment via bidimensional similarity computation and a hierarchical masking strategy. Extensive experiments on GlaS, Synapse and ISIC2018 datasets confirm that MAFormer consistently surpasses state-of-the-art encoder-decoder methods on both large and small scale datasets. In particular, it achieves higher Dice scores, underscoring the effectiveness of MAFormer in improving overall segmentation accuracy.