Abstract
INTRODUCTION: This study aimed to test the performance of DentalMonitoring's (DM) artificial intelligence (AI) in detecting aligner tracking issues. METHODS: This multicenter retrospective comparative study analyzed 3,323 assessments from 623 patients treated at multiple U.S. sites. DM's AI performance was evaluated using a binary model (seated vs. unseated) and a three-level model (seated, slight unseat, noticeable unseat). AI outputs were compared against a reference standard established through independent case reviews performed by a panel of three U.S.-based orthodontic residents. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. RESULTS: For the binary comparison (seat vs. unseat), sensitivity was 93.2% and specificity 86.2%, with a PPV of 89.2% and an NPV of 94.4%. For the three-level comparison, the noticeable-unseat category demonstrated a sensitivity of 91.1% and a specificity of 90.5%, with a PPV of 66.1% and an NPV of 98.3%. The high NPV values across both models indicate that DM's AI was particularly reliable in ruling out clinically meaningful unseat events. The lower PPV in the noticeable-unseat category reflects the low prevalence of noticeable unseats in the dataset. CONCLUSION: DM's AI system demonstrated high sensitivity and negative predictive values in identifying unseat events and in differentiating noticeable from slight unseats within the positive subset. These results indicate that the model performed reliably within the parameters and dataset evaluated, particularly in minimizing false-negative assessments of clinically meaningful misfits. Further validation in independent cohorts and across broader clinical contexts is warranted to confirm generalizability.