Abstract
OBJECTIVES: To obtain high-quality pre-treatment localization MR (sMR) images from dynamic cine-MR using the Swin-ResViT network for target tracking in MRgRT. METHODS: We propose a ResViT model fused with a Swin Transformer module (Swin-ResViT) with an optimized bottleneck layer structure for enhancing feature extraction efficiency. Seventeen liver cancer patients were retrospectively enrolled from Sun Yat-sen University Cancer Center from February to July 2024, and 12 of them were assigned to the training set (using intra-treatment cine-MR and pre-treatment planning MR), with the remaining 5 patients as the test set. Image generation quality and model performance were comprehensively evaluated by quantifying the normalized root mean square error (NRMSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), motion marker point error, and model inference speed between sMR and reference localization MR. RESULTS: Regarding image quality, Swin-ResViT reduced NRMSE and LPIPS by 90% and 82% compared to cine-MR (P<0.001), and improved PSNR, SSIM, and CNR by 157%, 79%, and 181% (P<0.001), respectively. Regarding structural accuracy, the mean localization error of motion markers at the right hepatophrenic junction in the generated dynamic sMR sequences was 0.7695±0.7294 mm (P<0.05). Regarding model inference speed, for a single 224×224-pixel frame, the average processing time on an NVIDIA GeForce RTX 2080 Ti GPU was 15.5 ms for Swin-ResViT as compared with 41.4 ms for the ResViT network, demonstrating a 62% reduction. CONCLUSIONS: The Swin-ResViT model can synthesize high-quality sMR from cine-MR images. This method combines computational efficiency with significant image enhancement advantages, and thus has important clinical significance for real-time MRgRT.