Abstract
PURPOSE: To build and evaluate deep learning models for recognizing cataract surgical steps from whole-length surgical videos with minimal preprocessing, including identification of routine and complex steps. METHODS: We collected 298 cataract surgical videos from 12 resident surgeons across 6 sites and excluded 30 incomplete, duplicated, and combination surgery videos. Videos were downsampled at 1 frame/second. Trained annotators labeled 13 steps of surgery: create wound, injection into the eye, capsulorrhexis, hydrodissection, phacoemulsification, irrigation/aspiration, place lens, remove viscoelastic, close wound, advanced technique/other, stain with trypan blue, manipulating iris, and subconjunctival injection. We trained two deep learning models, one based on the VGG16 architecture (VGG model) and the second using VGG16 followed by a long short-term memory network (convolutional neural network [CNN]- recurrent neural network [RNN] model). Class activation maps were visualized using Grad-CAM. RESULTS: Overall top 1 prediction accuracy was 76% for VGG model (93% for top 3 accuracy) and 84% for the CNN-RNN model (97% for top 3 accuracy). The microaveraged area under receiver-operating characteristic curves was 0.97 for the VGG model and 0.99 for the CNN-RNN model. The microaveraged average precision score was 0.83 for the VGG model and 0.92 for the CNN-RNN model. Class activation maps revealed the model was appropriately focused on the instrumentation used in each step to identify which step was being performed. CONCLUSIONS: Deep learning models can classify cataract surgical activities on a frame-by-frame basis with remarkably high accuracy, especially routine surgical steps. TRANSLATIONAL RELEVANCE: An automated system for recognition of cataract surgical steps could provide to residents automated feedback metrics, such as the length of time spent on each step.