Abstract
INTRODUCTION: Stroke is a prominent cause of long-term disability, impacting patients' socioeconomic status in daily life. Hemorrhagic and ischemic strokes differ in dimensions, forms, and locations, posing challenges for automated detection. Magnetic resonance imaging (MRI), particularly diffusion-weighted imaging (DWI), reveals changes in fluid balance, thereby enabling early detection. Hence, MRI scans are more accurate than computed tomography (CT) scans due to their increased sensitivity. METHODS: To categorize brain strokes, a hybrid model combining bidirectional long short-term memory (BiLSTM) with a vision transformer (ViT) was developed using an MRI dataset from a private source. ViT identifies qualities using MRI. The ViT captures global contextual and spatial representations using patch-based self-attention (16×16 patches, 256-dimensional projections, four transformer encoder layers with eight attention heads), whereas the BiLSTM network (128 and 64 units) models dependencies inside transformer-encoded features. A comparative study was conducted for the hybrid architecture with deep learning models, including a convolutional neural network (baseline, 85.5%), VGG16 (87.8%), ResNet50 (89.2%), ViT (91.3%), and BiLSTM (88.6%). RESULTS: The hybrid ViT-Bi-LSTM model achieved a precision of 97.35%, recall of 93.04%, accuracy of 95.21%, F1-score of 95.15%, and ROC-AUC of 99.36%, outperforming other comparative approaches. The standalone ViT achieved an accuracy of 91.3%, exceeding the CNN-based methods. In 5-fold cross-validation, the hybrid ViT-BiLSTM model achieved an average accuracy of 96.61%, with a standard deviation of 0.78, indicating stable performance across folds. These findings validate the combination of bidirectional temporal modeling with transformer-based feature extraction. CONCLUSION: By capturing the global spatial context through self-attention and bi-directional features via recurrent processing, ViT with Bi-LSTM networks expands stroke classification from MRI data. The ViT-Bi-LSTM model showed a promising approach for clinical decision support systems in early stroke diagnosis. Future research will use federated learning (FL) to protect privacy and assess model generalizability across multi-institutional MRI datasets.