Abstract
Skin cancer, especially melanoma, the most severe type, has increased in recent decades. It develops from cells that grow abnormally and can invade the surrounding tissue and spread throughout the body. Early and accurate diagnosis is essential to prevent disease progression and allow for less invasive clinical treatment. The extraction of complex dermoscopic images and the improvement of lesion classification performance have significantly improved skin cancer diagnosis through the use of convolutional neural networks (CNNs). In this study, a novel deep convolutional neural network that combines ConvNeXt and Vision Transformer (ViT) architectures through an adaptive attention-based approach for advanced feature fusion to automatically multi-classify skin cancer samples. This model is evaluated on two dermoscopy benchmark datasets, including ISIC-2019 and HAM10000 and both datasets reflect the real-world problem of class imbalance. The evaluation results of MedFusionNet are calculated using various evaluation metrics, including accuracy, precision, recall and AUC and compared with deep learning algorithms such as ResNet50, MobileNet V2, DenseNet121 and ViT-B16. The experimental results show that MedFusionNet outperforms the current models with a classification accuracy of 98.80% and 97.90% for HAM10000 and ISIC-2019, respectively. Grad-CAM visualizations qualitatively show that the model focuses on clinically relevant lesion regions, providing interpretive insight without claiming complete causal explainability. The results show that the proposed model can efficiently handle multi-class tasks in dermatological imaging and MedFusionNet is a suitable choice for implementation in real-world computer-aided diagnosis systems.