Abstract
Accurate delineation of treatment targets and organs at risk (OARs) is essential to the success of radiotherapy (RT). Although artificial intelligence (AI)-based segmentation methods have successfully automated the delineation process, a reliable and efficient quality assurance (QA) mechanism is still missing, particularly in the time-critical setting of online adaptive RT (oART). This study aims to address this unmet clinical need by introducing an AI-driven multi-organ contour QA framework with uncertainty quantification. Using MR-guided oART for prostate cancer as the testbed, we developed an automatic multi-structure QA framework by employing a carefully designed contour quality estimation (ConQuE) model. Built on a ResNet34 backbone, ConQuE processes binary segmentation masks with corresponding MR images to classify contours as 'acceptable' or 'revision required' in a slice-by-slice manner. To ensure scalability, we proposed a unified framework for multi-organ QA by embedding structure-specific code into the fully connected layers. To quantify classification uncertainty, Monte Carlo (MC) dropout was employed, enabling confidence assessment of model predictions for informative decision-making in oART. ConQuE was trained to assess the segmentation quality of prostate and six OARs, including rectum, bladder, urethra, penile bulb, cauda equina, and femoral heads. Training was performed using 249 cases for training, 31 for validation, and another 31 cases were saved for testing. With 30 MC dropout passes, ConQuE can complete QA for one image slice in ∼13.1 ms. Overall, ConQuE achieved accuracy of 93.9% on all structures with structure-specific performance consistently above 90%, demonstrating its strong capability in identifying revision required contours. Moreover, incorrect predictions were strongly correlated with high uncertainty scores, underscoring the effectiveness in improving QA reliability. The proposed AI-based framework holds the potential to automate contour QA with quantified uncertainty. It is well-suited for oART with heavy time constraints given its high accuracy, efficiency, and reliability.