Abstract
PURPOSE: Retinopathy of prematurity (ROP) is a leading cause of blindness in infants. Early and accurate screening is essential. Current deep learning systems can help, yet their accuracy drops when used on different population. We aimed to find the best deep learning model for plus disease in ROP and to test it on a multi-center dataset. METHODS: We built a cross-center retinal image database by merging public and private sets (FARFUM-RoP, HVDROPDB, LAN-RoP, Preterm infants <34 weeks GA from three tertiary NICUs, 2635 images). Nine types models were compared: ResNet34, ResNet50, DenseNet, Inception, MobileNet, VGG16, VGG19, EfficientNet, and Swin-Transformer. We compared FLOPs, parameter count, accuracy, recall, precision, and F1 score. ResNet50 showed the best balance and was kept. Ten-fold cross-validation was run on FARFUM-RoP alone, LAN-RoP alone, and their combined set. RESULTS: Across the three diagnostic tasks, the ResNet50 algorithm attained area-under-the-curve (AUC) values of 0.97, 0.95 and 1.00 (95% CI 0.94-0.99, 0.91-0.98, 0.97-1.00) for Normal, Pre-plus and Plus disease, respectively. When trained on the consolidated multi-centre cohort, the model achieved optimal overall performance, delivering an accuracy of 92.60%, recall of 92.58%, precision of 92.69% and F1-score of 92.60%-all metrics surpassing those obtained with any single-centre training set. CONCLUSION: Compared with single-centre training, the cross-centre fusion strategy significantly enhanced the generalisability of the artificial-intelligence model, yielded superior diagnostic indices, and improved diagnostic accuracy for infants from diverse demographic backgrounds.