Abstract
OBJECTIVE: Current intraoperative navigation systems have shown significant effectiveness for organs with fixed shapes, but they struggle to adapt to the challenges of tissue deformation and displacement in gastrointestinal surgeries. This study evaluates the established YOLOv8 and the emerging YOLOv12 with enhanced feature extraction capabilities, aiming to identify an optimal real-time model for dynamic surgical scenarios to improve procedural efficiency and safety. METHODS: In this multi-center retrospective study, object detection and instance segmentation was achieved by training YOLOv8 and YOLOv12 models on 1,847 images extracted from 22 surgical videos collected across four hospitals nationwide. The models were subsequently validated and tested and performance was rigorously compared using standard metrics, such as precision, recall, mAP@0.5, mAP@0.5-0.95, and the size of the weight file. Furthermore, the clinical applicability of the top-performing models was evaluated via a questionnaire survey. RESULTS: Both YOLOv8 and YOLOv12 demonstrated competent performance in object detection and instance segmentation tasks. For the test set, YOLOv12 achieved significantly higher recall rates than YOLOv8 in both object detection and instance segmentation (P = 0.037 and P = 0.031, respectively). Furthermore, when evaluating the YOLOv12 model on the test set, object detection significantly outperformed instance segmentation in terms of mAP@0.5 and recall (P = 0.045 and P = 0.036, respectively). The weights files of YOLOv8 and YOLOv12 have sizes of 6.8 megabytes (MB) and 6.0 megabytes (MB) respectively. Questionnaire results indicated a trend suggesting that AI-assisted technology has the potential to reduce surgical time and lower the risk of missed lymph node detection among junior surgeons. CONCLUSION: In scenarios with limited hardware resources, the object detection task using the YOLOv12 model is strongly recommended to assist in robotic colon cancer surgery, enhancing surgical efficiency and safety.