Abstract
BACKGROUND: Extracapsular extension (ECE) of prostate cancer (PCa) is a crucial determinant for preoperative staging, significantly influencing urologists’ surgical planning. While magnetic resonance imaging (MRI) aids staging by depicting ECE, its diagnostic accuracy is frequently constrained by poor image quality. Deep learning-based diffusion-weighted imaging (DL-DWI) holds potential for enhancing diagnostic accuracy through improved image quality. METHODS: This multicenter retrospective study enrolled 252 consecutive patients with clinically suspected PCa from five centers between June 2019 and September 2023. All patients underwent multiparametric MRI (mpMRI) followed by radical prostatectomy. A denoising diffusion probabilistic model (DDPM) was developed to synthesize high-b-value DL-DWI images. Two radiologic residents independently evaluated the image quality of DL-DWI and conventionally obtained DWI (C-DWI) using a five-point Likert scale. Objective image quality metrics including estimated signal-to-noise ratio (eSNR), contrast-to-noise ratio (CNR), edge rise distance (ERD), and edge rise slope (ERS) were quantitatively assessed and compared between sequences. Four radiologists (experience range: 2–20 years) independently made the diagnosis of each patient based on the DL-DWI and C-DWI using the four-scale ECE grade system. Intra-reader agreement was assessed using weighted kappa, and inter-reader agreement using Fleiss’ kappa. Diagnostic performance was compared using multi-reader multi-case receiver operating characteristic analysis, with the area under the curve (AUC) as the metric. Decision curve analysis (DCA) evaluated the net clinical benefit. Statistical significance was defined as a two-sided P < 0.05. RESULTS: DL-DWI demonstrated significantly superior image quality compared to C-DWI (P < 0.05) and higher agreement for both intra-reader (weighted κ: 0.59, 0.45 vs. 0.40, 0.33 for readers 1 and 2, respectively) and inter-reader (weighted κ: 0.59 vs. 0.48) quality assessments. Quantitative analysis confirmed substantial improvement in DL-DWI across all objective metrics: eSNR (56.66 ± 7.36 vs. 31.76 ± 12.97, P < 0.001), CNR (28.89 ± 9.27 vs. 10.53 ± 5.48, P < 0.001), ERD (2.85 [IQR:1.90–3.80] vs. 4.27 [IQR:3.27–5.60], P < 0.001), and ERS (69.48 ± 19.15 vs. 23.82 ± 10.29, P < 0.001). For ECE detection, DL-DWI showed higher intra-reader agreement (weighted κ for readers 3–6: 0.53, 0.56, 0.67, 0.70 vs. 0.53, 0.34, 0.52, 0.49) and inter-reader agreement (Fleiss κ: 0.25 vs. 0.18) compared to C-DWI. DL-DWI achieved significantly higher diagnostic performance (mean AUC: 0.76 [95% CI: 0.71–0.80] vs. 0.70 [95% CI: 0.65–0.74], P < 0.001). DCA indicated greater net clinical benefit for DL-DWI compared to C-DWI. CONCLUSION: DL-DWI appears to outperform C-DWI in both image quality and clinical utility, showing potential to improve ECE diagnostic accuracy for residents, enhance inter-reader consistency, and provide greater net benefit in clinical decision-making. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12880-025-02109-x.