Abstract
Background: Deep convolutional neural networks (DCNNs) are increasingly used in computer-aided dental diagnostics. However, the relative diagnostic performance of commonly applied architectures, particularly Faster R-CNN and Mask R-CNN, has not been systematically synthesized across imaging modalities. This systematic review and meta-analysis compared the diagnostic accuracy of Faster R-CNN and Mask R-CNN for dental caries detection using radiographic and photographic images. Methods: PubMed (MEDLINE), EMBASE, Web of Science, and Scopus were systematically searched for studies published up to 15 June 2025. Studies applying Faster R-CNN and/or Mask R-CNN to dental caries detection were included. Binary diagnostic data were extracted, and pooled sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were estimated using a bivariate random-effects model. Study quality was assessed with QUADAS-AI, and radiomics-based radiographic studies were additionally evaluated using the Radiomics Quality Score (RQS). The protocol was registered in PROSPERO (CRD420251074443). Results: Seventeen studies met the inclusion criteria. Across all imaging modalities, Mask R-CNN showed significantly higher pooled sensitivity (85.6% vs. 71.7%, p = 0.0244), specificity (94.2% vs. 81.4%, p = 0.00089), and AUC (0.95 vs. 0.84, p = 0.0053) than Faster R-CNN. In radiographic images, Mask R-CNN consistently outperformed Faster R-CNN in sensitivity (86.3% vs. 67.2%, p = 0.0497), specificity (96.5% vs. 85.0%, p = 0.00105), and AUC (0.97 vs. 0.86, p = 0.0067). In photographic images, Mask R-CNN achieved a higher AUC (0.91 vs. 0.83, p = 0.048), whereas differences in pooled sensitivity (83.5% vs. 77.3%, p = 0.435) and specificity (86.0% vs. 75.1%, p = 0.156) were not statistically significant. Conclusions: Faster R-CNN and Mask R-CNN both show potential for dental caries detection, but current evidence is limited by substantial heterogeneity, predominantly retrospective designs, and variability in imaging and labeling. Across the included studies, Mask R-CNN showed higher pooled performance estimates than Faster R-CNN, with the clearest differences in radiographic applications; however, this comparison is indirect and should be considered suggestive rather than definitive given study-level heterogeneity and uncertainty in the reference standard in a sizable proportion of studies. Prospective, multi-center studies with standardized imaging protocols, rigorous annotation, and independent external validation are required to support reliable clinical implementation.