Abstract
OBJECTIVES: Infarct volumes on diffusion-weighted imaging (DWI) are critical for predicting stroke outcomes and guiding late-window endovascular thrombectomy. Although 3D U-Net-based deep learning achieves high sensitivity, it often yields false positives due to infarct mimics. We developed a SegMamba-based model to enhance global volumetric feature extraction and compared both approaches on a dataset encompassing multiple DWI hyperintense pathologies. METHODS: Two models were trained on a multicenter dataset of 10,820 DWI scans (2011-2014) and evaluated against manual segmentation on an external test set of 2731 fresh DWI scans. Diagnostic accuracy was assessed in a clinical cohort of 1194 patients from a different center (2017-2020) who underwent DWI for various indications. We compared the models using the Dice similarity coefficient (DSC), average Hausdorff distance (AHD), sensitivity, and specificity. RESULTS: The training, external test, and clinical test datasets had mean (SD) ages of 67.9 (12.8), 68.2 (12.7), and 63.9 (15.4) years, with 58.9%, 60.4%, and 58.1% male, respectively. In the external test dataset, SegMamba and U-Net achieved similar DSC (0.786 vs 0.785; p = 0.141), but SegMamba outperformed U-Net in AHD (1.25 mm vs 1.76 mm; p < 0.001). In the clinical dataset, SegMamba showed slightly lower sensitivity (96.97% vs 98.79%) but substantially higher specificity (58.80% vs 29.54%), resulting in higher overall accuracy (64.07% vs 39.11%; p < 0.001). CONCLUSIONS: Changing the main architecture of the segmentation model alone maintained segmentation performance within ischemic-stroke cohorts, while achieving better classification in broader disease populations. This study highlights the need for deep-learning models to be validated not only for segmentation performance within target disease cohorts but also across diverse clinical environments to ensure practical utility.