Abstract
Skin cancer continues to pose a formidable global health challenge, where expedient detection is paramount to diminishing mortality. However, the inherent heterogeneity of skin lesions, exacerbated by class imbalance, frequently undermines automated classification efforts, particularly in unconstrained environments such as smartphone imagery that lacks dermoscopic clarity. In this research, we present DualRefNet, a novel multimodal deep learning paradigm that employs a dual-stage feature refinement strategy. First, an auxiliary super-resolution task augments visual representations; subsequently, a class-frequency-based regularization of the final fully connected layers refines the fused features, thus mitigating errors induced by high intra-class and low inter-class variability. Concurrently, a weighted cross-entropy loss deftly addresses class imbalance. Empirical evaluations on the PAD-UFES20 and ISIC-2019 datasets demonstrate balanced accuracies of 0.845 and 0.815, respectively, attesting to DualRefNet’s prowess under varied conditions. Furthermore, the confusion matrix and class-wise analyses highlight its equitable performance across all categories, rendering it a potential candidate for widespread, resource-constrained deployments.