Abstract
BACKGROUND: Artificial intelligence (AI) is increasingly utilized in orthodontics to automate cephalometric analysis. Despite promising results for AI-based landmark detection on 2D cephalograms, the reliability of AI analyses performed on 3D CBCT data by commercial systems remains unclear. The aim of this study was to evaluate the interchangeability between the results of AI-driven cephalometric analyses using two commercial software platforms (CephX and Invivo) on CBCT datasets and reference 2D cephalometric analysis following manual landmark identification. MATERIALS AND METHODS: This retrospective study included 135 patients (mean age: 26.8 ± 10.1 years) who underwent concurrent CBCT and lateral cephalometric imaging. Analyses (Steiner and Ricketts methods) based on manual landmark identification served as the reference standard. For each variable, the mean difference (AI - reference) with 95% confidence intervals (CIs) and Bland-Altman plots were calculated. The intrareader repeatability of the manual reference was assessed with the ICC (3). For clinical interpretability, commonly accepted bands (angles ± 1-2°, linear ± 2 mm) were used. RESULTS: Both AI systems showed systematic bias for key skeletal and dental measures. For CephX, SNA differed by + 3.40° (95% CI + 1.66 to + 5.14), SNB by + 5.88° (+ 4.33 to + 7.43), and ANB by - 2.48° (- 4.38 to - 0.58). For Invivo, SNA was + 5.07° (+ 2.40 to + 7.74), the SNB was + 4.40° (+ 2.64 to + 6.16), and the ANB was + 5.55° (+ 0.77 to + 10.33). Many parameters also had wide limits of agreement, indicating poor precision for individual patients. Several variables suggested proportional bias (greater discrepancy at larger values). The intrareader ICCs for the manual reference were 0.91-0.95. Overall, the 95% CIs for sentinel angles commonly exceed ± 2°, which argues against clinical interchangeability. CONCLUSIONS: In their current form, the evaluated AI CBCT cephalometry tools do not match manual 2D cephalometry closely enough for standalone clinical use. Differences of 3-6° in the core sagittal angles are large enough to change the diagnosis and treatment plans. AI outputs should be reviewed and, when necessary, corrected by clinicians.