Abstract
Introduction Skin cancer diagnosis currently relies heavily on visual assessment by dermatologists, creating challenges for standardization and accessibility. While machine learning (ML) approaches, particularly convolutional neural networks, have shown promise in automated detection systems, these methods often require significant computational resources and present interpretability challenges that limit their clinical adoption. This study investigates whether lightweight, transparent tree-based ensemble methods, specifically Random Forest (RF) and Gradient Boosted Decision Trees (GBDT), can achieve comparable accuracy in classifying four common dermoscopic categories: basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), melanocytic nevi (MN), and melanoma. Methods A publicly available archive supplied 8,000 dermoscopic images, roughly 2,000 per lesion class. Each image underwent color-constancy correction, hair removal, and tight cropping; rotations, flips, zooms, and contrast-limited adaptive histogram equalization mitigated class imbalance. Handcrafted descriptors (Haralick texture features, local binary patterns (LBP), and red-green-blue histograms) yielded a 768-element feature vector, which was then z-score normalized. Hyperparameters for RF and GBDT were optimized by Bayesian search within five-fold stratified cross-validation. A lightweight MobileNetV2 convolutional neural network served as a deep learning (DL) benchmark. Model performance was quantified on a 20% hold-out set using accuracy, macro-averaged F-score, and the area under the receiver operating characteristic curve. Feature contributions were interpreted with Shapley Additive Explanations (SHAP). Results Gradient Boosted Decision Trees achieved an accuracy of 89% and a macro-averaged F-score of 0.88, narrowly outperforming Random Forest at 86% accuracy and 0.85 F-score. Both ensembles exceeded 0.94 in receiver operating characteristic area for melanoma detection, matching the compact convolutional neural network while training more than 10 times faster. Shapley Additive Explanations highlighted blue-black pigmentation and irregular border texture as the most influential cues, in agreement with established dermatological heuristics and thereby enhancing interpretation. Conclusion This study demonstrates the effectiveness of a traditional machine learning (ML) approach for the classification of skin diseases, providing a practical and interpretable alternative to deep learning (DL) models. With careful feature engineering, traditional tree-based ensemble models can rival compact deep learning networks for multi-class skin lesion classification while offering faster training times and clearer decision logic. These characteristics make them appealing for deployment in resource-constrained settings and point-of-care diagnostic tools.