Abstract
Accurate segmentation of skin lesions is essential for automated melanoma detection and dermoscopic image analysis. Traditional convolutional networks often fail to capture both fine-grained lesion boundaries and global contextual information. We porpose SkinAttn-Net, a stage-aware segmentation framework that integrates complementary attention mechanisms according to the functional role of each network component. A vision transformer module is employed at the encoder bottleneck to model long-range dependencies and global context, Convolutional Block Attention Modules to refine intermediate encoder-decoder feature interactions, and Squeeze-and-Excitation blocks to recalibrate skip connections and preserve high-resolution lesion characteristics. The model was extensively evaluated on three widely used dermoscopic datasets, including intra- and cross-dataset experiments, ablation analyses, module-wise dropout sensitivity assessments, and qualitative visualization studies. Quantitative results show Dice scores of 0.926, 0.910, and 0.951 on ISIC 2017, ISIC 2018, and HAM10000, respectively, with consistently high precision, recall, and specificity. Cross-dataset evaluation demonstrates robust generalization across imaging conditions and lesion types. Ablation studies confirm the complementary contributions of different modules, while dropout analyses reveal the effectiveness of moderate regularization in enhancing stability and generalization. Comparisons with state-of-the-art methods highlight superior segmentation performance. Grad-CAM visualizations indicate progressive refinement of lesion boundaries across decoder layers. SkinAttn-Net effectively combines hierarchical attention mechanisms to balance local detail preservation with global contextual modeling, offering a robust, generalizable solution for dermoscopic image analysis and clinical decision support in dermatology.