Abstract
This study aimed to verify whether a commercial deep learning-based automatic segmentation (DLS) method can maintain contour geometric accuracy post-update and to propose a streamlined validation method that minimizes the burden on clinical workflows. This study included 109 participants. Radiation oncologists used computed tomography (CT) imaging to identify 28 organs located in the head and neck, chest, abdomen, and pelvic regions. Contours were delineated on CT images using AI-Rad Companion Organs RT (AIRC; Siemens Healthineers, Erlangen, Germany) versions VA30, VA50, and VA50. The Dice similarity coefficient, maximum Hausdorff distance, and mean distance to agreement were calculated to identify contours with significant differences among versions. To evaluate the identified contours, the ground truth was defined as the contour delineated by radiation oncologists, and the geometric indices for VA30, VA50, and VA60 were recalculated. Statistical analysis was performed on the geometric indices to verify differences between each version. Among the 28 contours evaluated, nine organs did not satisfy the established criteria. Statistical analysis revealed that the brain, rectum, and bladder contours differed substantially across AIRC versions. In particular, the pre-update rectum contour had a mean (range) Hausdorff distance of 0.76 (0.40-1.16), whereas the post-update rectum contour exhibited lower quality, with a Hausdorff distance of 1.13 (0.24-5.68). Therefore, commercial DLS methods that undergo regular updates must be reassessed for quality in each region of interest. The proposed method can help reduce the burden on clinical workflows while appropriately evaluating post-update DLS performance.