Abstract
PURPOSE: This study presents a deep learning framework for automatic parotid segmentation using three-dimensional (3D) U-Net and attention-augmented 3D U-Net architectures trained with a novel combined loss function tailored for anatomical accuracy and class imbalance. MATERIALS AND METHODS: A curated dataset of 379 noncontrast head-and-neck computed tomography scans with expert-verified contours was used. Two architectures a residual 3D U-Net and its attention-enhanced variant were implemented using TensorFlow. The networks were trained with both categorical cross-entropy and a proposed combined loss integrating modified Dice Score Coefficient (mDSC) and focal loss (FL) with weights 0.7 and 0.3. The models were evaluated using dice similarity coefficient (DSC), Intersection over Union (IoU), and categorical accuracy. A custom checkpointing strategy was designed to preserve model weights corresponding to both peak DSC and minimum validation loss. The code and pretrained models are hosted on a publicly available GitHub repository at: https://github.com/1aryantyagi/Segmentation-Paper. RESULTS: The 3D U-Net trained with the combined loss achieved a median Dice score of 0.8835 (left parotid) and 0.8709 (right), with mean IoU values of 0.7672 and 0.7358, indicating strong segmentation accuracy. The U-Net produced comparable results, supporting the combined loss's consistency. Bland-Altman analysis confirmed reduced variability and improved agreement. CONCLUSION: The integration of mDSC and FL within a 3D U-Net architecture significantly improves segmentation performance, robustness, and spatial precision. These findings support the clinical feasibility of the proposed framework for automated, reproducible parotid delineation in radiotherapy planning.