Abstract
Background: The Vesical Imaging-Reporting and Data System (VI-RADS) has limited diagnostic accuracy in distinguishing non-muscle-invasive bladder cancer (NMIBC) within VI-RADS categories 2 and 3, despite its value for overall NMIBC assessment. Dynamic contrast-enhanced MRI (DCE-MRI), which reflects tumor vascularity, holds promise for improving these challenging cases but remains underutilized due to unexploited spatiotemporal information. Methods: We developed a deep learning model to comprehensively quantify spatiotemporal features from multiphase DCE-MRI in 184 patients with VI-RADS 2 or 3 (training: n = 115, validation: n = 20, testing: n = 49). The model integrated multiscale feature extraction and contextual attention mechanisms to enhance diagnostic performance. Results: The model outperformed established benchmarks (e.g., VGG, ResNet) and the conventional VI-RADS ≤ 2 threshold (sensitivity: 0.67 for NMIBC), achieving a sensitivity of 0.90 (95% CI: 0.81-0.96) for NMIBC and an area under the curve (AUC) of 0.82 (95% CI: 0.75-0.89) for overall classification. Visualizations confirmed its ability to identify key spatiotemporal patterns linked to muscle invasion. Conclusions: By leveraging comprehensive spatiotemporal information from DCE-MRI, our deep learning model significantly improves NMIBC diagnosis in VI-RADS 2/3 cases, offering a clinically valuable tool to address the limitations of current VI-RADS assessment.