Abstract
BACKGROUND: The comorbidity of cataract and age-related macular degeneration (AMD) poses a significant public health burden. Traditional linear statistical models often fail to capture complex, non-linear interactions among risk factors. This study aimed to develop an interpretable machine learning framework to predict comorbidity risk and elucidate the synergistic effects of systemic and ocular factors. METHODS: A retrospective case-control study was conducted involving 640 participants (264 comorbidity cases and 376 controls). Fifteen multi-dimensional clinical features were extracted. Four machine learning algorithms-Logistic Regression, Random Forest, SVM, and XGBoost-were trained and validated. Model performance was assessed via AUROC, AUPRC, and calibration curves. SHapley Additive exPlanations (SHAP) and LIME were employed to visualize global and local interpretability. RESULTS: The XGBoost model demonstrated robust discriminative performance (AUC = 0.895, 95% CI: 0.85-0.93) and calibration compared to other algorithms. SHAP analysis identified drusen severity and lens opacity (LOCS III) as dominant ocular predictors, while C-reactive protein (CRP) and smoking were critical systemic contributors. Notably, interaction analysis revealed a non-linear synergistic effect: smoking was associated with an exponentially higher comorbidity risk in individuals aged >75 years, whereas CRP exhibited a distinct saturation threshold effect. Decision curve analysis confirmed the model's high net clinical benefit across a wide range of threshold probabilities. CONCLUSION: This study establishes a robust, clinically applicable risk stratification tool for cataract and AMD comorbidity. By uncovering non-linear interactions between aging, lifestyle, and inflammation, it provides valuable evidence-based support for personalized screening and preventive intervention.