Abstract
Skin cancer has become a global public health issue. Dermoscopy is a routine diagnostic method; however, to improve accuracy, it is often combined with skin punch biopsy and stained histological slides for microscopic observation. The manual segmentation of the lesion area by doctors involves issues such as high subjectivity and time consumption. Deep learning techniques have become a mainstream solution. Compared to foreground-background segmentation in dermoscopic images, the semantic segmentation task for whole-slide skin cancer images is more complex, requiring precise differentiation of 10 distinct tissue classes (such as tumor, epidermis, dermis, hair follicle, sweat gland, fat, etc.). Among these, various epithelial and dermal tissue types exhibit similar morphological features and are interwoven, which increases segmentation difficulty. To address this, we propose a multi-frequency domain attention-based dual encoder network (MSF-VMDNet), which combines U-Net and Vision Mamba dual encoders for parallel feature extraction. The U-Net encoder incorporates an improved AFNO spectral decomposition module, which uses a frequency domain mechanism to extract high-resolution multi-class semantic information. It further strengthens spatial information through multi-scale feature aggregation, improving segmentation accuracy at class boundaries and making the contours clearer. The Vision Mamba encoder, based on a Linear State Space Model (SSM), optimizes long-range dependency modeling and enhances both global and local feature perception. By utilizing a multi-frequency domain mechanism, this encoder maps subtle class-discriminative features from the skin histological slide into the frequency domain, reinforcing contextual features and reducing misclassification rates. In the decoding phase, the SCConv module fuses features from different frequency domains and spatial levels. Experimental results show that MSF-VMDNet significantly outperforms existing methods in terms of class segmentation performance on skin cancer histological slide datasets, achieving an MIoU of 95.37% and a Dice coefficient of 95.11%. Additionally, the model demonstrates its generalization ability in extended experiments on the ISIC 2018 dermoscopic image, PanNuke pathological cell nucleus image, and Synapse image datasets.