Abstract
PURPOSE: To evaluate the performance of three large language models (LLMs) in automated recognition of IOLMaster 700 reports and preoperative toric intraocular lens (IOL) planning. METHODS: The retrospective study analyzed preoperative examination reports of patients who underwent cataract surgery with toric IOL implantation. Three models (ChatGPT-5, ChatGPT-5 Thinking and DeepSeek Thinking) were instructed to extract key biometric parameters, evaluate a patient's suitability for toric IOL implantation, and generate a plan. Model performance was evaluated based on structured-data recognition, refractive prediction outcomes and thinking times. RESULTS: Fifty-four eyes of 54 patients were analyzed. ChatGPT-5 Thinking model consistently achieved the highest agreement with clinical reference for all extracted parameters, and demonstrated more reliable extraction of axis information. ChatGPT-5 showed intermediate performance, while DeepSeek Thinking was the least consistent in axis-dependent fields but performed adequately for basic biometry. Refractive and axis prediction errors were smallest with ChatGPT-5 Thinking, yielding the largest proportion of cases within prespecified clinical thresholds and the highest concordance with the calculator-based reference plan. Analysis of thinking times showed that longer processing did not necessarily correlate with better accuracy. CONCLUSIONS: Advanced LLMs show promise for automated interpretation of ophthalmic biometry reports and calculator-based toric IOL planning workflows. These findings support the feasibility of LLM-assisted workflow automation, with ChatGPT-5 Thinking providing the most favorable balance of accuracy and efficiency in this setting.