Abstract
BACKGROUND: Reliable identification of embryo ploidy is essential for optimising outcomes in assisted reproductive technology (ART). Conventional deep learning models, however, are limited by class imbalance, particularly due to the underrepresentation of mosaic embryos. AIM: This study aimed to improve embryo ploidy classification by integrating Vision Transformers (ViTs) with sequential time-lapse imaging and applying random undersampling (RUS) to mitigate data imbalance. SETTINGS AND DESIGN: A retrospective study using blastocyst-stage time-lapse imaging data from a fertility clinic. Customised deep learning models were developed to predict embryo ploidy status. MATERIALS AND METHODS: A total of 1020 blastocyst videos with genetically confirmed ploidy were analysed, generating 99,324 sequential frames representing the final 10 h of development before biopsy. To address imbalance, RUS produced a balanced dataset of 17,000 images per class: Euploid, aneuploid and mosaic. Two ViT architectures (ViT-B/16 and ViT-B/32) were fine-tuned for binary and multiclass tasks. Model performance was evaluated using accuracy, precision, recall, and F1-score on both balanced and imbalanced datasets. STATISTICAL ANALYSIS USED: Model performance was evaluated using accuracy, precision, recall, and F1-score. A 5-fold cross-validation procedure was applied to ensure robustness and reduce variance across data splits. RESULTS: The ViT-B/16 achieved 0.84 accuracy in binary and 0.67 in multiclass classification on the balanced dataset, whereas performance dropped to 0.49 on the imbalanced set. RUS improved the prediction of minority classes, particularly mosaic embryos. CONCLUSION: Combining ViTs with sequential time-lapse imaging and RUS provides a promising non-invasive approach for embryo ploidy classification, enhancing accuracy for mosaic embryos and supporting more informed embryo selection in ART.