Abstract
BACKGROUND: Multimodal magnetic resonance imaging (MRI) provides complementary tissue contrasts that are essential for accurate clinical diagnosis. However, high scanning costs, long acquisition times, and motion-induced artifacts often result in incomplete or degraded multimodal data. This study aimed to develop a robust synthesis framework to generate missing or corrupted MRI modalities, thereby improving diagnostic reliability and reducing acquisition burden. METHODS: We propose a novel multimodal MRI synthesis framework that integrates the global contextual modeling capability of Transformer modules with the multi-frequency local feature extraction of ResOctave blocks in a unified architecture. An attention-based feature fusion module is introduced to facilitate cross-modal feature interaction, enabling structurally faithful and high-quality image synthesis. RESULTS: On the 2018 Multimodal Brain Tumor Segmentation Challenge (BraTS2018) dataset, the proposed method achieved an average peak signal-to-noise ratio (PSNR) of 32.61, structural similarity index measure (SSIM) of 0.90, and normalized root mean square error (NRMSE) of 0.15 for artifact-free inputs. Under motion artifact interference, performance remained robust, with PSNR of 30.02, SSIM of 0.88, and NRMSE of 0.18. Additional experiments confirmed its generalizability, achieving comparable results in T2 and T1 synthesis tasks from complementary modalities. CONCLUSIONS: The proposed framework achieves high structural fidelity, fine-detail preservation, and robustness to motion artifacts in multimodal MRI synthesis. These findings confirm its effectiveness for generating reliable synthetic modalities, while future work will extend validation to additional datasets to further assess generalization and clinical applicability.