Abstract
Accurate skin lesion classification algorithms play a crucial role in improving patient survival rates by enabling early detection and timely treatment. However, current methods struggle with limited feature extraction capabilities, which are further compounded by challenges such as data imbalance and high intra-class variance, making precise diagnosis particularly challenging. To overcome these hurdles, this investigation proposes SCTFD (Synthetic Classification Transformer Framework for Dermoscopy), a novel dermoscopic image classification framework designed to enhance classification accuracy. First, the SCTFD generates minority class samples using a nearest sampling synthesis approach based on an encoder-decoder structure (CN-SMOTE). Subsequently, it extracts features using MARD-Net (Multi-head Attention Residual Dilated Network), which integrates spatial-channel attention to enhance CNN performance and global sliding window attention to reduce the computational complexity of the Transformer. Finally, the loss is computed using FDLoss, specifically designed to address data imbalance and high intra-class variance. To validate the proposed method, experiments are conducted on the ISIC 2018 and ISIC 2019 public datasets. Experimental results show that SCTFD achieved an accuracy of 92.81% and an F1 score of 0.93 on ISIC 2018, and an accuracy of 91.33% and an F1 score of 0.88 on ISIC 2019, significantly lowering the classification barriers for critical diagnostic tasks.