Abstract
IMPORTANCE: Retinopathy of prematurity (ROP) screening requires frequent examinations to avoid missed treatment-requiring disease, but this approach is burdensome for infants, families, and health systems. Whether precision risk models could reduce examination burden without compromising safety is not known. OBJECTIVE: To develop and externally validate an interpretable risk model integrating gestational age (GA), postmenstrual age (PMA), and vascular severity (artificial intelligence-derived VSS or clinician-assigned P-score) to predict 2-week risk of treatment-requiring retinopathy of prematurity (TR-ROP) and estimate its impacts in screening frequency. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study used data from the Imaging and Informatics in ROP (i-ROP) consortium (2011-2022) and the Stanford University Network for Diagnosis of ROP (SUNDROP; 2013-2021). The i-ROP dataset was split into training, validation, and test subsets, and SUNDROP served as an external validation cohort. A subset of i-ROP examinations with clinician assessment scores (P-scores) was analyzed for clinical adaptability. Data were analyzed from September 1, 2024, to September 1, 2025. EXPOSURES: GA, PMA, and vascular severity (VSS or P-score). MAIN OUTCOMES AND MEASURES: Discrimination (area under the receiver operating characteristic curve [AUROC], area under the precision-recall curve [AUPRC]), sensitivity, specificity for predicting TR-ROP within 2 weeks, and the proportion of examinations that could be deferred in a retrospective simulation while maintaining 100% sensitivity. RESULTS: Among 559 infants in the i-ROP training dataset and 1544 in SUNDROP, infants who developed TR-ROP had a mean GA of 2.9 (95% CI, 2.3-3.5) weeks lower and birth weight of 391 (95% CI, 328-454) g than those who did not. Adding vascular severity improved discrimination vs GA alone (AUROC difference 0.13 [95% CI, 0.06-0.19] in i-ROP; 0.08 [95% CI, 0.03-0.14] in SUNDROP). A decision threshold achieved 100% sensitivity with moderate specificity (i-ROP 63%; SUNDROP 73%). Simulated risk-based scheduling reduced 28% (376 of 1384) of examinations in i-ROP and 39% (2356 of 6090 ) in SUNDROP without missing TR-ROP. Substituting P-scores for VSS preserved model performance (AUROC 0.87; 95% CI, 0.76-0.97; sensitivity 100%; 95% CI, 63%-100%). CONCLUSIONS AND RELEVANCE: In this study, this validated clinically adaptable model provided individualized visit-level TR-ROP risk assessment with potential to improve screening efficiency by reducing unnecessary examinations without missing TR-ROP. The model will be made publicly available for further validation; however, prospective evaluation within a defined clinical workflow is required prior to routine implementation.