Abstract
BACKGROUND: Lesion segmentation in medical images is crucial for clinical diagnosis and treatment planning. However, existing methods often struggle to effectively extract both local and global features, limiting segmentation accuracy. To address this challenge, we propose a dual-branch network that integrates state space models (SSMs) with deep convolutional networks to enhance the extraction of both local and global features, thus improving lesion segmentation performance. METHODS: The proposed model employs a dual-branch encoder: one branch incorporates the visual state space encoder to efficiently model long-range contextual dependencies, while the other branch, based on the residual network, extracts hierarchical local features. To refine feature representation, we introduce a lightweight multi-scale depth-wise separable convolution block, ensuring adaptability to varying lesion sizes while maintaining computational efficiency. The fused features are processed by the decoder for high-precision segmentation. RESULTS: Extensive experiments on the Kaggle_3M and Kvasir-SEG datasets demonstrated that the proposed model outperformed existing state-of-the-art models. Specifically, it achieved a dice similarity coefficient (Dice) of 0.9140 and a false negative rate (FNR) of 0.0800 on Kaggle_3M dataset, and a Dice of 0.9173 and an FNR of 0.0788 on Kvasir-SEG dataset. Compared to other models, our model delivered superior quantitative results and visual segmentation performance. In addition, when trained on Kvasir-SEG and tested on two external datasets, our model demonstrated superior cross-dataset generalization. CONCLUSIONS: The proposed model integrates SSMs and deep convolutional networks to improve lesion segmentation by effectively capturing both local and global features. It offers new insights for medical image segmentation with potential clinical applications.