Abstract
Objectives: This study evaluated the accuracy of an artificial intelligence (AI)-based cephalometric software (WebCeph version 2.0.0.) compared with manual tracing and determined whether stratifying patients by chronological age or dentition stage provides a more clinically relevant assessment of AI accuracy. Methods: Three hundred lateral cephalometric radiographs of Vietnamese patients were traced manually by an orthodontist (reference standard) and analyzed automatically by WebCeph. Intra-observer reliability was validated using ICC and Dahlberg's error. We analyzed the data using three stratification strategies: (1) Overall; (2) Chronological age (<18, 18-25, >25 years); and (3) Dentition stage (<9 primary-early mixed, 9-12 late mixed, >12 permanent). The primary outcome was the absolute measurement difference (∣Δ∣), analyzed using the Kruskal-Wallis test and effect size (η2). Results: Overall, WebCeph showed high concordance with manual tracing (ICC > 0.80 for most parameters). Chronological age stratification showed weak associations with measurement error; differences between groups were largely non-significant (p>0.05) with a small effect size (η2≈0.015). In contrast, the dentition stage revealed significant performance disparities (p<0.05). Notably, accuracy for the Mandibular Arc (ICC = 0.349) and Mandibular Plane Angle (p=0.048) degraded significantly in the primary-early mixed group, a vulnerability obscured by chronological age-based stratification. Conclusions: Dentition stage is a more sensitive and biologically relevant predictor of AI accuracy than chronological age. While WebCeph is reliable for permanent dentition, accuracy degrades significantly in the primary-early mixed phase. Clinicians should prioritize manual verification of mandibular and incisor landmarks in mixed-dentition children.