Abstract
Automated polyp detection plays a critical role in the early diagnosis of colorectal cancer, ranking as the second leading cause of cancer-related mortality worldwide. However, existing segmentation methods face difficulties in handling complex polyp shapes, size variations, and generalising across diverse datasets. We propose a Multi-dimensional Residual Attention Network (MRANet) for the polyp segmentation task, focusing on enhancing feature representation and ensuring robust performance across diverse clinical scenarios. During encoding, MRANet employs residual self-attention to capture semantic information of high-level features, guiding the refinement of low-level information. In addition, convolutions with Multiple Kernel and Dilation rates (CMKD) are integrated with residual channel and spatial attentions to expand the model's receptive field, enhance encoder features, and accelerate convergence. In the decoding stage, MRANet uses the proposed Attention-based Scale Interaction Module (ASIM) to merge upsampled high-level features with low-level pixel information, enriching low-level layers using semantic knowledge. A Residual-based Scale Fusion Module (RSFM) is further designed to merge low-level features, which preserves high-frequency details including edges and textures. Experiments demonstrate that MRANet effectively segments polyps with varying sizes, indistinct boundaries, and scattered distributions, achieving the best overall performance. Our code is available at https://github.com/hpguo1982/MRANet.