Abstract
PURPOSE: We aim to evaluate the performance of different deep learning (DL) architectures in breast density classification using digital mammograms (DMs) and synthetic mammograms (SMs) from digital breast tomosynthesis (DBT). APPROACH: We retrospectively analyzed routine mammographic screening exams (Selenia Dimensions, Hologic Inc.) acquired between 2015 and 2018 at our institution. Each mammogram dataset (DM and SM) included 10,000 exams representing all four breast imaging reporting and data system density categories (a to d). We used ResNet-50, EfficientNet-B0, and DenseNet-121 architectures, separately fine-tuned for breast density classification with DM and SM. Classification accuracy was assessed on 10% unseen test sets in four-category (a to d) and binary (nondense versus dense) scenarios. Evaluations also considered mammogram view (craniocaudal [CC] versus mediolateral-oblique [MLO] view) and race (White versus Black women). RESULTS: DL architectures showed detectable, yet small, differences in classification accuracy regardless of mammogram format. ResNet-50 achieved a four-category accuracy of 0.727 (95% CI: [0.713, 0.740]) for DM, higher than 0.713 (95% CI: [0.699, 0.728]) for SM ( p = 0.151 ). EfficientNet-B0 and DenseNet-121 showed similar trends. DM-SM differences for binary classification were of similar magnitude but statistically significant ( p < 0.05 ), with test accuracies ranging from 0.871 to 0.920. The MLO view generally outperformed the CC view, and the results were consistent across racial groups. CONCLUSIONS: We highlight that various DL architectures perform effectively in breast density classification, emphasizing the significance of mammogram format and view, though results may vary with different vendors. These insights are crucial for enhancing DL-based breast density assessment, especially during the shift from DM to DBT.