Abstract
BACKGROUND: Manual grading of dental students’ embroidery assignments is not only labor-intensive but also subjective. To address these limitations, our study proposes an automated grading model based on ResNet-50 architecture enhanced with a multi-region aggregation mechanism. This approach aims to standardize the grading process, improve fairness and efficiency in assessment. METHODS: A total of 381 embroidery assignment images were collected from the 2020–2023 student cohorts. The 2022 cohort was designated as an external test set to assess model generalization with different data distributions. We proposed a multi-region aggregation mechanism based on ResNet-50 and compared two aggregation strategies: multi-head attention (MHA) aggregation and average weighting (AW) aggregation. VGG-16, DenseNet-121, ViT, and ResNet-50 were considered as baseline models. All models were trained using 5-fold cross-validation, employing a weighted CrossEntropyLoss to address class imbalance, with evaluation metrics including accuracy, precision, recall, and F1 score. RESULTS: The ResNet-50 AW model achieved the highest test accuracy of 80% on the test set, while the ViT and the VGG-16 models achieved 75%, second to ResNet-50 AW. Although models’ performance degraded on the external test set, ResNet-50 AW maintained the highest accuracy of 64% and reduced misclassifications of grade B and C samples. Despite excelling on the validation set, ResNet-50 MHA showed similar performance to ResNet-50 on the test set. ViT and VGG-16 achieved higher accuracy for grade A on both the test set and the external test set. CONCLUSION: The ResNet-50 AW model highlights the potential of deep learning methods to automate the grading of artistic assignments via a multi-region aggregation mechanism. Further validation of the model’s generalization is needed. Future work should improve dataset quality and diversity and enhance system interpretability to refine the grading process for greater accuracy and transparency.