Abstract
Background Orthodontic diagnostic workflows often rely on manual classification and archiving of large volumes of patient images, a process that is both time-consuming and prone to errors such as mislabeling and incomplete documentation. These challenges can compromise treatment accuracy and overall patient care. To address these issues, we propose an artificial intelligence (AI)-driven deep learning framework based on convolutional neural networks (CNNs) to automate the classification and archiving of orthodontic diagnostic images. Our AI-based framework enhances workflow efficiency and reduces human errors. This study is an initial step towards fully automating orthodontic diagnosis and treatment planning systems, specifically focusing on the automation of orthodontic diagnostic record classification using AI. Methods This study employed a dataset comprising 61,842 images collected from three dental clinics, distributed across 13 categories. A sequential classification approach was developed, starting with a primary model that categorized images into three main groups: extraoral, intraoral, and radiographic. Secondary models were applied within each group to perform the final classification. The proposed model, enhanced with attention modules, was trained and compared with pre-trained models such as ResNet50 (Microsoft Corporation, Redmond, Washington, United States) and InceptionV3 (Google LLC, Mountain View, California, United States). External validation was performed using 13,729 new samples to assess the artificial intelligence (AI) system's accuracy and generalizability compared to expert assessments. Results The deep learning framework achieved an accuracy of 99.24% on an external validation set, demonstrating performance almost on par with human experts. Additionally, the model demonstrated significantly faster processing times compared to manual methods. Gradient-weighted class activation mapping (Grad-CAM) visualizations confirmed that the model effectively focused on clinically relevant features during classification, further supporting its clinical applicability. Conclusion This study introduces a deep learning framework for automating the classification and archiving of orthodontic diagnostic images. The model achieved impressive accuracy and demonstrated clinically relevant feature focus through Grad-CAM visualizations. Beyond its high accuracy, the framework offers significant improvements in processing speed, making it a viable tool for real-time applications in orthodontics. This approach not only reduces the workload in healthcare settings but also lays the foundation for future automated diagnostic and treatment planning systems in digital orthodontics.