Abstract
Background: The clinical imperative to reduce patient ionizing radiation exposure during diagnosis and treatment planning necessitates robust, high-fidelity synthetic imaging solutions. Current cross-modal synthesis techniques, primarily based on GANs and deterministic CNNs, exhibit instability and critical errors in modeling high-contrast tissues, thereby hindering their reliability for safety-critical applications such as radiotherapy. Objectives: Our primary objective was to develop a stable, high accuracy framework for 3D Magnetic Resonance Imaging (MRI) to Computed Tomography (CT) synthesis capable of generating clinically equivalent synthetic CTs (sCTs) across multiple anatomical sites. Methods: We introduce a novel 3D Latent Diffusion Model (3DLDM) that operates in a compressed latent space, mitigating the computational burden of 3D diffusion while leveraging the stability of the denoising objective. Results: Across the Head & Neck, Thorax, and Abdomen, the 3DLDM achieved a Mean Absolute Error (MAE) of 56.44 Hounsfield Units (HU). This result demonstrates a significant 3.63% reduction in overall error compared to the strongest adversarial baseline, CycleGAN (MAE = 60.07 HU, p < 0.05), a 10.76% reduction compared to NNUNet (MAE = 67.20 HU, p < 0.01), and a 20.79% reduction compared to the transformer-based SwinUNeTr (MAE = 77.23 HU, p < 0.0001). The model also achieved the highest structural similarity (SSIM = 0.885 ± 0.031), significantly exceeding SwinUNeTr (p < 0.0001), NNUNet (p < 0.01), and Pix2Pix (p < 0.0001). Likewise, the 3D-LDM achieved the highest peak signal-to-noise ratio (PSNR = 29.73 ± 1.60 dB), with statistically significant gains over CycleGAN (p < 0.01), NNUNet (p < 0.001), and SwinUNeTr (p < 0.0001). Conclusions: This work validates a scalable, accurate approach for volumetric synthesis, positioning the 3DLDM to enable MR-only radiotherapy planning and accelerate radiation-free multi-modal imaging in the clinic.