Abstract
BACKGROUND: In recent years, deep learning algorithms based on dermatoscopy have shown great potential in diagnosing basal cell carcinoma (BCC). However, the diagnostic performance of deep learning algorithms remains controversial. OBJECTIVE: This meta-analysis evaluates the diagnostic performance of deep learning algorithms based on dermatoscopy in detecting BCC. METHODS: An extensive search in PubMed, Embase, and Web of Science databases was conducted to locate pertinent studies published until November 4, 2024. This meta-analysis included articles that reported the diagnostic performance of deep learning algorithms based on dermatoscopy for detecting BCC. The quality and risk of bias in the included studies were assessed using the modified Quality Assessment of Diagnostic Accuracy Studies 2 tool. A bivariate random-effects model was used to calculate the pooled sensitivity and specificity, both with 95% CIs. RESULTS: Of the 1941 studies identified, 15 (0.77%) were included (internal validation sets of 32,069 patients or images; external validation sets of 200 patients or images). For dermatoscopy-based deep learning algorithms, the pooled sensitivity, specificity, and area under the curve (AUC) were 0.96 (95% CI 0.93-0.98), 0.98 (95% CI 0.96-0.99), and 0.99 (95% CI 0.98-1.00). For dermatologists' diagnoses, the sensitivity, specificity, and AUC were 0.75 (95% CI 0.66-0.82), 0.97 (95% CI 0.95-0.98), and 0.96 (95% CI 0.94-0.98). The results showed that dermatoscopy-based deep learning algorithms had a higher AUC than dermatologists' performance when using internal validation datasets (z=2.63; P=.008). CONCLUSIONS: This meta-analysis suggests that deep learning algorithms based on dermatoscopy exhibit strong diagnostic performance for detecting BCC. However, the retrospective design of many included studies and variations in reference standards may restrict the generalizability of these findings. The models evaluated in the included studies generally showed improved performance over that of dermatologists in classifying dermatoscopic images of BCC using internal validation datasets, highlighting their potential to support future diagnoses. However, performance on internal validation datasets does not necessarily translate well to external validation datasets. Additional external validation of these results is necessary to enhance the application of deep learning in dermatological diagnostics. TRIAL REGISTRATION: PROSPERO International Prospective Register of Systematic Reviews CRD42025633947; https://www.crd.york.ac.uk/PROSPERO/view/CRD42025633947.