Abstract
Early and accurate diagnosis of melanoma remains a major challenge due to the heterogeneous nature of skin lesions and the limitations of traditional diagnostic tools. In this study, we introduce HyperFusion-Net, a novel hybrid deep learning architecture that synergistically integrates a Multi-Path Vision Transformer (MPViT) and an attention U-Net to simultaneously perform melanoma classification and lesion segmentation in dermoscopic images. Unlike conventional CNN-based methods, HyperFusion-Net combines the general feature extraction capabilities of transducers with the spatial accuracy of the U-Net, which is enhanced by a mutual attention fusion block that facilitates the effective fusion of semantic and spatial features. The model was trained and evaluated using four public ISIC datasets containing over 60,000 dermoscopic images. Preprocessing techniques such as hair removal, clipping, and normalization were applied to improve robustness. Experimental results show that HyperFusion-Net consistently outperforms state-of-the-art models including U-Net, DeepLabV3 + , TransUNet, and Swin-UNet, achieving superior performance in classification (accuracy: 93.24%, AUC: 95.80%) and segmentation (Dice coefficient: 0.945 in ISIC 2024). Ablation studies confirm the effectiveness of the multi-path design and fusion strategy in enhancing diagnostic performance while maintaining computational efficiency. Furthermore, the model demonstrates strong generalizability across datasets with different lesion types and imaging conditions.