Abstract
Background/Objectives: Accurate intraocular lens (IOL) power calculation is vital for achieving the desired postoperative spherical equivalent (SE) in cataract surgery. Generative Artificial-Intelligence (AI) systems are increasingly being used in ophthalmology to refine diagnostic and surgical planning. However, it is still unknown whether a low-cost, easily accessible generative AI model like DeepSeek can match the accuracy of conventional biometric formulas. To evaluate the accuracy of DeepSeek, an open-source generative artificial intelligence (AI), in predicting postoperative refractive spherical equivalent compared to the Barrett Universal II formula in uncomplicated cataract surgeries. Methods: This study analyzed biometric data from 50 eyes of 50 patients who underwent cataract surgery between July 2024 and January 2025 at Humanitas Research Hospital in Milan, Italy. Only uncomplicated cases of emmetropia with Alcon AcrySof(®) SA60WF IOL implantation were included. 30-40 days postoperative subjective refraction was measured with a calibrated trial-frame and 6 m logMAR chart by an experienced optometrist. Prediction error (PE), median absolute error (MedAE), standard deviation (SD), and cumulative frequency of PE diopters range were calculated. A Wilcoxon signed-rank test was performed to assess statistical significance. Results: Barrett showed MedAE 0.36 D [0.16-0.64] and MAE 0.43 D (95% CI, 0.34-0.52) while DeepSeek-R1 showed MedAE 0.76 D [0.52-1.01] and MAE 0.77 D (95% CI, 0.67-0.87). Cumulative accuracy (AE threshold) at ±0.25/±0.50/±0.75/±1.00/±1.25/±1.50/±1.75 D was 37.7/71.7/81.1/92.5/100.0/100.0/100.0% for Barrett Universal II and 11.1/25.9/50.0/74.1/88.9/96.3/100.0% for DeepSeek-R1 (McNemar p < 0.01 at each threshold). The paired comparison of per-eye absolute errors favored Barrett (Wilcoxon signed-rank test, p < 0.0001). Conclusions: In this cohort, Barrett Universal II outperformed DeepSeek-R1 across MedAE/MAE and cumulative accuracy thresholds, with a significant paired difference. A general-purpose generative model used off-the-shelf (fixed A-constant, no ophthalmology-specific tuning) did not match the accuracy of a validated vergence-based formula; established formulas remain the reference standard for clinical IOL power calculation.