Abstract
BACKGROUND: Retinopathy of prematurity (ROP) has emerged as one of the leading causes of visual impairment or blindness among newborn infants worldwide. The purpose of this study was to develop a predictive model for ROP using machine learning methods. METHODS: A retrospective study was conducted on 586 neonates admitted to the Department of Neonatology at the First Affiliated Hospital of Guangxi Medical University from January 2019 to January 2024, who met the inclusion criteria and underwent ROP screening.1.ROP-related risk factors were collected by reviewing electronic medical records during hospitalization and follow-up outpatient visits.2.Lasso regression was applied to screen ROP-related risk factors, identifying significant predictors. Seven machine learning models were constructed using these predictors. Model performance was evaluated and compared based on metrics including Area Under the ROC Curve (AUC), accuracy, precision, sensitivity, specificity, F1-score, and Kappa coefficient. RESULTS: 1. Lasso regression screened 109 ROP-related risk factors and identified 46 significant predictors. These factors were used to construct seven machine learning models.2. Among the models, the random forest (RF) algorithm demonstrated optimal performance, with the following metrics: Training set: AUC: 1.000; accuracy: 99.7%; precision: 99.7%; specificity: 99.7%; Sensitivity: 99.7%; F1-score: 0.997; Kappa coefficient: 0.994.Testing set: AUC: 0.981; accuracy: 95.7%; precision: 92.3%; specificity: 99.3%; Sensitivity: 66.7%; F1-score: 0.774; Kappa coefficient: 0.751. CONCLUSION: The RF predictive model based on 46 significant ROP-related risk factors exhibits strong predictive value for ROP occurrence. This model provides a useful tool for early clinical identification of high-risk ROP populations.