Abstract
Introduction Breast cancer screening mammography faces challenges from variable radiologist performance and missed cancers. Deep learning segmentation models offer promise for automated lesion detection, but most training datasets are biased toward normal cases, limiting performance on clinically relevant abnormalities. Methods A custom-enhanced U-Net architecture was trained on annotated abnormal mammograms from the Digital Mammography Dataset for Breast Cancer Diagnosis Research (DMID). Training employed a two-stage approach: (1) patch-based pretraining on 224 × 224 lesion-centered crops with 2:1 negative-to-positive sampling, and (2) full-image fine-tuning with 35% hybrid patch sampling to preserve small-lesion sensitivity. A composite loss function combining focal loss and Tversky loss addressed class imbalance and boundary precision. Images were downsampled from native 4000-6000 pixels to 224 × 224 with Contrast Limited Adaptive Histogram Equalization (CLAHE) contrast enhancement. Performance was evaluated on 55 test images using Dice coefficient, Intersection-over-Union (IoU), pixel accuracy, Hausdorff distance, and lesion detection rate (IoU > 0.10), with size-stratified analysis across small (<500 pixels), medium (500-1500 pixels), and large (>1500 pixels) lesion categories. Results The model achieved a mean Dice score of 0.5793 (median, 0.7120), a mean IoU of 0.4902 (median, 0.5541), a pixel accuracy of 0.9930, and an overall lesion detection rate of 77.8% (43/55). Size-stratified analysis revealed pronounced performance gradients: small lesions demonstrated a 73.0% detection rate (27/37) with a mean Dice score of 0.698; medium lesions achieved an 84.6% detection rate (11/13) with a mean Dice score of 0.724; and large lesions showed a 100% detection rate (5/5) with a mean Dice score of 0.908. Among detected lesions, weak positive correlations were observed between lesion size and segmentation quality (Dice, r = 0.233; IoU, r = 0.215). The primary failure mode was missed detection of small lesions (n = 12; mean size, 253.3 pixels vs. 1137.6 pixels for detected lesions; p < 0.001), likely attributable to information loss during 18-27-fold resolution reduction. Conclusions Abnormal-focused U-Net training achieved strong segmentation for large lesions (Dice 0.908, 100% detection) but exhibited critical limitations for small abnormalities (27% miss rate), representing a significant clinical barrier given that early-stage cancers are primary screening targets. The resolution bottleneck from downsampling high-resolution mammography represents a fundamental architectural limitation. Unknown false-positive rates on normal mammograms - absent from training - preclude clinical deployment. Future work should prioritize multi-scale architectures, hybrid training incorporating normal cases, external validation across diverse datasets, and prospective evaluation in screening workflows before clinical translation.