Abstract
PURPOSE: In markerless tumor tracking (MTT) with deep learning, model performance suffers from domain shifts due to noise and anatomical changes. This study aimed to develop a convolutional neural network (CNN) model for real-time MTT segmentation. METHODS: An Uncertain Feature-refinement Attention Unet (UFA-Unet), designed based on insights into CNN behavior under domain distribution shifts that occur between digitally reconstructed radiographs (DRRs) and kV X-ray fluoroscopic (XF) images, is proposed. A qualitative ablation study was performed to examine the contribution of each UFA-Unet component to segmentation accuracy. The model feasibility of UFA-Unet was evaluated through quantitative and phantom studies. The quantitative study included ten lung cancer cases, each containing two datasets (1(st)-plan and 2(nd)-plan), with a mean interval of 28 days between four-dimensional computed tomography (4DCT) acquisitions. Patient-specific models were trained on 1(st)-plan DRRs and validated using noise-injected 1(st)-and 2(nd)-plan DRRs. In the phantom study, UFA-Unet was trained with only a single exhalation phase (T50) of 4DCT data and evaluated using dynamic phantom XF images with 25-mm amplitude motion. UFA-Unet was compared against U-Net, Attention-Unet, and Swin-Unet. RESULTS: The ablation study confirmed that each component suppressed over-activation to improve segmentation accuracy. In the quantitative study, UFA-Unet maintained superior performance compared with conventional models on both 1(st)- and 2(nd)-plan DRRs with noise injection. Furthermore, in the phantom study, UFA-Unet demonstrated robust tracking under previously unseen respiratory phases, achieving a 95(th) percentile 3D error of 0.61-3.13 mm and consistently outperforming conventional models. CONCLUSION: UFA-Unet provides accurate, robust, and real-time segmentation, thus demonstrating its suitability for clinical MTT.