Abstract
Medical image analysis is essential for accurate disease diagnosis, yet progress in developing high-performing deep learning models for medical image analysis remains limited by the scarcity of large, high-quality annotated datasets. Conventional transfer learning (CTL) from natural image pretrained models offers partial benefits but frequently encounters domain mismatch, resulting in limited generalizability to medical imaging tasks. This study reports the development and evaluation of a multistage transfer learning (MSTL) framework designed to improve domain adaptation and enhance diagnosis performance. The MSTL framework introduces an intermediate pretraining stage using cell line microscopic images to provide a more relevant source domain between ImageNet pretraining and downstream medical imaging tasks. The workflow consists of sequential pretraining on ImageNet, fine-tuning on cell line images, and final adaptation to medical datasets, including mammograms, ultrasounds, and X-rays. The study assessed MSTL performance using convolutional neural networks (CNNs) and vision transformers (ViTs) and compared results against CTL and training from scratch. The findings show that ViTs consistently outperform CNNs, with ViTB-16 achieving the highest accuracy across all datasets. Additionally, transferability metrics, Log Expected Empirical Prediction, Negative Conditional Entropy, and H-Score, exhibited strong positive correlations with model accuracy, particularly for mammography and X-ray tasks with ViTB-16 exceeding Pearson correlation coefficients 0.95. Overall, the MSTL framework substantially narrowed the gap between general image pretraining and specialized medical imaging tasks. By improving domain adaptation and generalization, it offers a robust and scalable pathway for advancing diagnostic performance in medical image analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-42157-z.