Abstract
Objective: To evaluate the performance of vision transformer (ViT)-based deep learning models in the classification of open apex on panoramic radiographs (orthopantomograms (OPGs)) and compare their diagnostic accuracy with conventional convolutional neural network (CNN) architectures. Materials and Methods: OPGs were retrospectively collected and labeled by two observers based on apex closure status. Two ViT models (Base Patch16 and Patch32) and three CNN models (ResNet50, VGG19, and EfficientNetB0) were evaluated using eight classifiers (support vector machine (SVM), random forest (RF), XGBoost, logistic regression (LR), K-nearest neighbors (KNN), naïve Bayes (NB), decision tree (DT), and multi-layer perceptron (MLP)). Performance metrics (accuracy, precision, recall, F1 score, and area under the curve (AUC)) were computed. Results: ViT Base Patch16 384 with MLP achieved the highest accuracy (0.8462 ± 0.0330) and AUC (0.914 ± 0.032). Although CNN models like EfficientNetB0 + MLP performed competitively (0.8334 ± 0.0479 accuracy), ViT models demonstrated more balanced and robust performance. Conclusions: ViT models outperformed CNNs in classifying open apex, suggesting their integration into dental radiologic decision support systems. Future studies should focus on multi-center and multimodal data to improve generalizability.