Abstract
Background/Objectives: This study provides a systematic benchmark of U-Net-based deep learning models for automatic tooth segmentation in panoramic dental radiographs, with a specific focus on how segmentation accuracy changes as computational cost increases across different encoder backbones. Methods: U-Net models with ResNet, EfficientNet, DenseNet, and MobileNetV3-Small encoder families pretrained on ImageNet were evaluated on the publicly available Tufts Dental Database (1000 panoramic radiographs) using a five-fold cross-validation strategy. Segmentation performance was quantified using the Dice coefficient and Intersection over Union (IoU), while computational efficiency was characterized by parameter count and floating-point operations reported as GFLOPs per image. Statistical comparisons were conducted using the Friedman test followed by Nemenyi-corrected post hoc analyses (p<0.05). Results: The overall segmentation quality was consistently high, clustering within a narrow range (Dice: 0.9168-0.9259). This suggests diminishing returns as the backbone complexity increases. EfficientNet-B7 achieved the highest nominal accuracy (Dice: 0.9259 ± 0.0007; IoU: 0.8621 ± 0.0013); however, the differences in Dice score between EfficientNet-B0, B4 and B7 were not statistically significant (p>0.05). In contrast, computational demands varied substantially (2.9-67.2 million parameters; 4.93-40.8 GFLOPs). EfficientNet-B0 provided an accurate and efficient operating point (Dice: 0.9244 ± 0.0011) at low computational cost (5.98 GFLOPs). In contrast, MobileNetV3-Small offered the lowest computational cost (4.93 GFLOPs; 2.9 million parameters), but also the lowest Dice score (0.9168 ± 0.0031). Compared with heavier ResNet and DenseNet variants, EfficientNet-B0 achieved competitive accuracy with a markedly lower computational footprint. Conclusions: The findings show that larger models do not always perform better and that models with increased performance may not necessarily yield meaningful gains. It should be noted that the findings are limited to the task of tooth segmentation; different findings may be obtained for different tasks. Among the models evaluated for tooth segmentation, EfficientNet-B0 stands out as the most practical option, maintaining near-saturated accuracy levels while keeping model size and computational cost low.