Abstract
OBJECTIVES: To develop and validate a deep learning model for whole breast clinical target volume (CTV) contouring and evaluate clinical features affecting its performance. METHODS: Five datasets with 857 patients from a single center were used. Dataset 1 (n = 300) trained and tested the model. Dataset 2 (n = 10) evaluated contouring time and dosimetric parameters. Datasets 3 (n = 20) and 4 (n = 10) were for clinical evaluation. Dataset 5 (n = 517) identified clinical factors influencing auto-contouring accuracy. Model performance was assessed using Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff Distance (HD95). RESULTS: The median DSC and HD95 for left- and right-sided models in Dataset 1 were 0.941, 1.75 mm and 0.937, 2.47 mm, respectively. In Dataset 2, both auto-contouring and auto-contouring with manual corrections were significantly faster than manual contouring (P = .005 for both), while still achieving clinically acceptable dosimetric results. In Dataset 3, two physicians rated automatic and manual contours as equivalent (P = .214, P = .075), while the other rated auto-contouring higher (P < .001). In Dataset 4, the auto-contouring model outperformed 1/5 physicians by DSC (P = .009) and 3/5 by HD95 (P = .015, P = .007, P = .017). In Dataset 5, peripheral tumor-bed and low-density breast tissue were associated with lower DSC (P < .001 for both) and higher HD95 (P < .001 for both). Cases without unfavorable factors performed better than those with (P < .001 for both). CONCLUSIONS: The proposed model demonstrated acceptable accuracy, consistency, and efficiency in breast CTV contouring. Peripheral tumor-bed and low-density breast tissue reduced auto-contouring performance. ADVANCES IN KNOWLEDGE: The characteristics of challenging cases in whole breast CTV auto-contouring should be identified.