Abstract
Non-mass enhancement (NME) in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) often shows subtle and diffuse enhancement, which can make accurate delineation challenging for radiologists. In this study, we developed an automated segmentation method for NME in DCE-MRI using a three-dimensional (3D) TransUNet model with a temporal attention mechanism. The attention module uses images of the difference between the pre- and early post-contrast phases to highlight regions that exhibit enhancement-related temporal changes. Our database consisted of 151 DCE-MRI cases with NME, each including both pre- and early post-contrast phases. The proposed network comprises two parallel CNN encoders (one for early post-contrast images and one for early–pre subtraction images), a temporal attention module, a transformer encoder, and a CNN decoder. The two encoders independently extract features from their respective inputs. The temporal attention module uses the features from the difference branch to weight those of the early branch. The weighted features are then fused and passed to the transformer encoder, which captures spatial and contextual dependencies within the 3D volume. The CNN decoder reconstructs the NME segmentation map through transposed convolutions and upsampling layers. Using five-fold cross-validation, the proposed method achieved a Jaccard index of 0.724 and a Dice coefficient of 0.802. Our approach significantly outperformed the baseline 3D TransUNet (Jaccard: 0.585, p < .001; Dice: 0.690, p < .001). These results demonstrate the high segmentation performance of the proposed method for NME and indicate its potential to improve diagnostic assessment in breast DCE-MRI. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12194-025-01004-y.