Abstract
INTRODUCTION: Rectal cancer often originates from polyps. Early detection and timely removal of polyps are crucial for preventing colorectal cancer and inhibiting its progression to malignancy. While polyp segmentation algorithms are essential for aiding polyp removal, they face significant challenges due to the diverse shapes, unclear boundaries, and varying sizes of polyps. Additionally, capturing long-range dependencies remains difficult, with many existing algorithms struggling to converge effectively, limiting their practical application. METHODS: To address these challenges, we propose a novel Dual Encoder Multi-Scale Feature Fusion Network, termed VMDU-Net. This architecture employs two parallel encoders: one incorporates Vision Mamba modules, and the other integrates a custom-designed Cross-Shape Transformer. To enhance semantic understanding of polyp morphology and boundaries, we design a Mamba-Transformer-Merge (MTM) module that performs attention-weighted fusion across spatial and channel dimensions. Furthermore, Depthwise Separable Convolutions are introduced to facilitate multi-scale feature extraction and improve convergence efficiency by leveraging the inductive bias of convolution. RESULTS: Extensive experiments were conducted on five widely-used polyp segmentation datasets. The results show that VMDU-Net significantly outperforms existing state-of-the-art methods, especially in terms of segmentation accuracy and boundary detail preservation. Notably, the model achieved a Dice score of 0.934 on the Kvasir-SEG dataset and 0.951 on the CVC-ClinicDB dataset. DISCUSSION: The proposed VMDU-Net effectively addresses key challenges in polyp segmentation by leveraging complementary strengths of Transformer-based and Mamba-based modules. Its strong performance across multiple datasets highlights its potential for practical clinical application in early colorectal cancer prevention. CODE AVAILABILITY: The source code is publicly available at: https://github.com/sulayman-lee0212/VMDUNet/tree/4a8b95804178511fa5798af4a7d98fd6e6b1ebf7.