Abstract
Background/Objectives: Accurate segmentation of skin lesions is essential for early skin cancer detection. However, traditional CNNs are limited in modeling long-range dependencies, leading to poor performance on lesions with complex shapes. Methods: We propose MSDTCN-Net, a dual-encoder network that integrates ConvNeXt and Deformable Transformer to extract both local details and global semantic information. A Squeeze-and-Excitation (SE) mechanism is introduced to adaptively emphasize important channels. To address scale variation in lesions, we design a Multi-Scale Receptive Field (MSRF) module combining multi-branch and dilated convolutions. Furthermore, a Hierarchical Feature Transfer (HFT) mechanism is employed to guide high-level semantics progressively to shallow layers, enhancing boundary reconstruction in the decoder. Results: Extensive experiments on the ISIC 2016, ISIC 2017, ISIC 2018, and PH2 datasets show that MSDTCN-Net achieves competitive performance across metrics including IoU, Dice, and ACC, validating its effectiveness and generalization in skin lesion segmentation. Conclusions: MSDTCN-Net effectively combines local and global feature extraction, multi-scale adaptability, and semantic guidance to achieve high-accuracy skin lesion segmentation, demonstrating its potential in clinical diagnostic applications.