Abstract
BACKGROUND: Real-time image-guided radiation therapy (IGRT) was first clinically implemented more than 25 years ago but is yet to find widespread adoption. Existing approaches to real-time IGRT require dedicated or specialized equipment that is not available in most treatment centers and most techniques focus exclusively on targets without tracking the surrounding organs-at-risk (OARs). PURPOSE: To address the need for inexpensive real-time IGRT, we developed Voxelmap, a deep learning framework that achieves 3D respiratory motion estimation and volumetric imaging using the data and resources already available in standard clinical settings. This framework can also be adapted to other imaging modalities such as MRI-Linacs. In contrast with existing approaches, which constrain the solution space with linear priors, Voxelmap encourages diffeomorphic mappings that are topology-preserving and invertible. METHODS: Deformable image registration and forward-projection or slice extraction were used to generate patient-specific training datasets of 3D deformation vector fields (DVFs) and 2D images (or k-space data) from pretreatment 4D-CT or 4D-MRI scans. The XCAT and CoMBAT digital phantoms and SPARE Grand Challenge Dataset provided synthetic and patient data, respectively. Five network architectures were used to predict 3D DVFs from 2D imaging data. Networks A-C were trained on x-ray images, Network D was trained on MR images and Network E was trained on k-space data. Using Voxelmap, network-generated 3D DVFs were used to warp both structures contoured on the peak-exhale pretreatment image and the image itself to enable simultaneous target and OAR tracking and volumetric imaging. Using the standard-of-care approach, contours were expanded to internal target volumes. RESULTS: Validating on digital phantom data for x-ray guided treatments of cardiac arrhythmia, mean Dice similarity between predicted and ground-truth target and OAR contours for Networks A-C ranged from 0.81 ± 0.05 to 0.82 ± 0.05 and 0.78 ± 0.04 to 0.81 ± 0.04, respectively, while target centroid error ranged from 2.0 ± 0.5 to 2.3 ± 0.9 mm. For MRI-based digital phantom data, mean Dice similarity for target and OAR contours was 0.91 ± 0.06 and 0.90 ± 0.02 for both Networks D and E, while target centroid error ranged from 1.7 ± 0.8 to 1.8 ± 0.8 mm. For x-ray-based lung cancer patient data, mean Dice similarity for target and OAR contours for Networks A-C ranged from 0.86 ± 0.05 to 0.89 ± 0.04 and 0.94 ± 0.01 to 0.97 ± 0.01, respectively. However, in terms of target centroid error, only Network A outperformed an ITV-based approach at 1.8 ± 0.7 mm while Networks B and C exhibited large errors of 2.7 ± 1.2 to 3.5 ± 1.4 mm, respectively. Target volumes dynamically shifted using Voxelmap were 31 % smaller than the standard-of-care. CONCLUSIONS: Voxelmap provides a generalized, open-source tool for intrafraction respiratory motion monitoring and volumetric imaging. Comparing tracking errors across synthetic and patient data revealed that certain network architectures are more robust to the scatter and noise profiles encountered in typical clinical settings. These learnings will inform future developments in real-time motion tracking. Our code is available at https://github.com/Image-X-Institute/Voxelmap .