Abstract
PURPOSE: This study aimed to develop and evaluate a deep learning model that directly analyzes three-dimensional automated breast ultrasound videos (DL-3DABUV) to assist breast cancer diagnosis, and to examine the optimal reading mode for clinical implementation. METHODS: This retrospective study included 547 patients (285 benign, 262 malignant), who were randomly assigned to a training set (n=437) and a test set (n=110). The DL-3DABUV model, built using ResNet50 and multi-instance learning, was trained by directly analyzing videos without image selection or manual annotation. Six radiologists (three experienced and three novice) evaluated the test set under three modes: independent-reading (without DL-3DABUV), second-reading (without prior knowledge of DL-3DABUV results), and concurrent-reading (after viewing DL-3DABUV results). The diagnostic performance of DL-3DABUV, experienced radiologists, and novice radiologists was compared. Reading times across the three modes were also assessed. RESULTS: Compared to experienced radiologists in independent reading, DL-3DABUV showed no significant differences in area under the receiver operating characteristic curve (AUC) (0.82 vs. 0.83), sensitivity (82.1% vs. 81.6%), or specificity (81.5% vs. 88.3%) (all P>0.05). DL-3DABUV exhibited higher AUC and specificity than novice radiologists in independent-reading (0.82 vs. 0.68, P<0.001; 81.5% vs. 57.4%, P<0.001). However, novice performance reached parity with DL-3DABUV in both second-reading and concurrent-reading. No significant differences in diagnostic performance were observed between second-reading and concurrent-reading. Concurrent reading significantly reduced reading time by 33.6 seconds compared with second-reading (P<0.001). CONCLUSION: DL-3DABUV achieves diagnostic performance comparable to experienced radiologists and enhances diagnostic accuracy for novices. Concurrent reading provides a more efficient workflow by reducing reading time while maintaining diagnostic performance.