Abstract
Diffusion models excel in generating high-quality images, yet their application in medical image translation, specifically for cone-beam computed tomography (CBCT) to CT translation, often fails to accurately preserve intricate anatomical details. Conventional methods including GANs and VAEs also struggle with high-quality translation due to the challenges in learning bi-directional CBCT/CT distribution mapping and a lack of robustness to out-of-distribution (OOD) testing data. To address the challenges, we proposed a denoising diffusion wavelet model (DDWM), which only requires learning the CT data distribution in the training process and then performs zero-shot CBCT-to-CT translation through a similarity-bridge-controlled reverse diffusion process. In this process, domain-invariant information (e.g., anatomical structures in medical images) from the source image is fused into the result at each step of the reverse diffusion. Specifically, in our DDWM, we use wavelet transform to decompose the image into different frequency bands, then identify the bands where the source and target domains are most similar (i.e., the domain-invariant information). This information is incorporated into each step of the reverse process, preserving the anatomical structures of the original CBCT image and facilitating structure-faithful translation. Trained on a brain CT dataset (Dataset I) and evaluated on three CBCT-to-CT translation datasets (Datasets I-III) - with Datasets II&III being OOD - DDWM outperformed other state-of-the-art methods across all metrics, including Frechet Inception Distance (FID), Peak Signal-to-Noise Ratios (PSNR), Mean Absolute Error (MAE), and DICE scores, demonstrating superior image translation quality and anatomical fidelity.