Abstract
BACKGROUND AND PURPOSE: Despite widespread adaptation of automatic segmentation (AS), manual review and adjustment of generated contours are still essential. This process is time-consuming and identifying clinically relevant corrections remains challenging. Inter-observer variability and the risk of overlooking significant errors further complicate the workflow. A dedicated quality assurance tool is highly relevant to assure quality and speed up the manual review task. The primary aim of this work is to identify critical segmentation errors while reducing unnecessary manual review, enabling efficient integration of AS into routine radiotherapy. MATERIALS AND METHODS: We developed an evaluation assistant that assesses contour quality through the geometric measures Dice similarity coefficient and the Hausdorff distance. This was combined with a dose prediction model to determine the clinical relevance. The system was validated on 30 glioblastoma cases with ground truth and manually modified organ at risk (OAR) contours. A traffic light decision matrix classified contours based on geometric and dose parameters, flagging structures for human review. RESULTS: Out of 507 analyzed OARs, 180 were classified as critical. Our approach identified 173 of these critical structures (sensitivity: 0.96, specificity: 0.55). The system flagged 317 organs (61%) as critical, effectively ruling out 39% as non-critical with only 7 false negatives comprising structures. CONCLUSIONS: Our dual-layer QA approach effectively identifies critical OAR segmentations with high sensitivity and acceptable specificity, potentially reducing manual review requirements significantly. By focusing on clinically relevant dose/volume metric endpoints, this method assures the quality of brain AS results in clinical radiotherapy practice.