Abstract
INTRODUCTION: Refractive error and dry eye are highly prevalent ocular conditions that significantly impair the quality of life and impose a substantial burden on individuals and society. Growing evidence suggests a correlation between these two conditions. This study aimed to develop and validate a machine learning (ML) model to accurately predict the risk of concurrent dry eye comorbidities in patients with refractive error. METHODS: Data from Xiamen Eye Center outpatient database (1st January 2024 to 28th February 2025) were analyzed (n = 114,579). Hyperparameter optimization, Spearman correlation analysis, and logistic regression analyses were performed. The final feature set was determined using a Random Forest algorithm with the sequential forward selection technique. Eight ML algorithms were evaluated through ten-fold cross-validation. The optimal model was selected based on a comprehensive assessment of the receiver operating characteristic curve, precision-recall curve, and decision curve analysis. For the best-performing model, SHapley Additive exPlanations and partial dependence plots were utilized to interpret the importance and interactions of risk factors. RESULTS: Baseline characteristics were comparable between the training set and the internal test set, while significant differences were observed in multiple baseline characteristics between dry eye group and non-dry eye group among subjects with refractive error. Based on ten selected feature variables, the tabular prior-data fitted network (TabPFN) model demonstrated the best performance, showing high screening efficacy with both specificity and accuracy reaching 0.945. The interaction analysis revealed that a longer duration of refractive error was associated with a higher risk of dry eye, a relationship that was particularly pronounced among older and female patients. Furthermore, an online web calculator was developed to deploy this diagnostic prediction model. DISCUSSION: This study developed a high-performance and interpretable ML system based on a large-scale real-world clinical dataset for the early prediction of concurrent dry eye risk in patients with refractive error. The system holds significant potential as a predictive aid for clinical decision-making, enabling more timely and personalized patient management, thereby offering substantial clinical value and promising application prospects.