Abstract
PURPOSE: This study evaluated the impact of a commercial AI-assisted contouring tool on intra- and inter-observer variability in prostate radiation therapy and assessed the dosimetric consequences of geometric contour differences. METHODS: Two experienced radiation oncologists independently delineated clinical target volume (CTV) and organs at risk (OARs) for prostate cancer patients. Manual contours (C(man)) and AI-generated contours (C(AI)) were compared with adjusted AI contours (C(AI,adj)). A consensus reference (C(ref)) served as the benchmark. To evaluate clinical impact, treatment plans were recalculated and replanned on each contour set under identical beam geometries to assess dose-volume histogram (DVH) parameters. RESULTS: AI-assisted contouring significantly improved both intra- and inter-observer agreement. Inter-observer analysis revealed that the Dice similarity coefficient (DSCs) for CTV increased from 0.78 (± 0.11) for C(man) to 0.89 (± 0.09) for C(AI, adj). Similarly, intra-observer analysis revealed that both oncologists showed significantly higher DSCs for C(AI, adj) compared to C(man). A thorough geometric comparison to the C(ref) revealed that while adjustments to C(AI) improved accuracy, they generally did not surpass C(man) for CTV and rectum. Dosimetric analyses demonstrated that, under fixed plan geometry, both C(man) and C(AI,adj) contours yielded lower planning target volume (PTV) D95% values compared with C(ref), whereas after replanning, all plans met institutional criteria with no clinically significant differences among contour sets. CONCLUSION: AI-assisted contouring in prostate radiotherapy reduced intra- and inter-observer variability and improved contouring consistency. However, C(AI, adj) did not consistently surpass C(man), especially for the CTV and rectum, where automation bias or selective clinical acceptance may have influenced edits. Fixed-plan recalculations revealed dose differences from minor geometric deviations. These findings underscore the importance of structured quality assurance (QA) and human oversight to mitigate automation bias while leveraging AI's efficiency. The single-institution design with two oncologists and one AI software limits generalizability, underscoring the need for multi-observer validation.