Abstract
BACKGROUND: Deep learning algorithms can synthesize pulmonary functional images from CT images. However, previous studies have only been able to predict either ventilation or perfusion from CT, limiting the holistic evaluation of lung function. PURPOSE: This study aimed to develop a deep learning-based framework for simultaneously generating lung perfusion and ventilation images from three-dimensional CT. METHODS: A total of 98 cases who underwent single-photon emission CT perfusion images (SPECT PI) with (99m)Tc-labeled macroaggregated albumin, ventilation images (VI) with (99m)Tc-Technegas, and three-dimensional CT images were collected. The three-dimensional CT and SPECT images were registered and cropped to include only the lung parenchyma. A dual-decoder residual attention network (DDRAN) was constructed to generate both PI and VI simultaneously from three-dimensional CT images. For comparative assessment, we additionally employed a conventional single-decoder residual attention network (RAN) to individually generate PI and VI. The structural similarity index (SSIM) and Spearman's rank correlation coefficient (Rs) were utilized to assess voxel-wise agreement. Additionally, the Dice similarity coefficient (DSC) was applied to evaluate function-wise concordance. We used the Wilcoxon signed-rank test to statistically evaluate the differences between the images synthesized by DDRAN and RAN. Beyond image-similarity metrics, we evaluated overall model performance using threshold-based classification. Lastly, a two-part reader study was conducted: (I) qualitative image acceptability for clinical review, and (II) illustrative diagnostic interpretation based on synthesized image pairs alone. RESULTS: Overall, DDRAN and RAN achieved comparable performance. The average SSIM values of the DDRAN/RAN model were 0.871/0.866 (p < 0.05) for PI and 0.830/0.825 (p < 0.05) for VI, and the Rs values were 0.836/0.819 and 0.732/0.731, respectively. The DDRAN/RAN model achieved average DSC values of 0.795/0.796 for PI and 0.708/0.718 for VI in low-function regions, and 0.857/0.849 for PI and 0.793/0.793 for VI in high-function regions. In the two-part reader study, the synthesized perfusion and ventilation images received almost acceptable scores across all experience levels and demonstrated diagnostic potential. CONCLUSIONS: We have developed a dual-decoder residual attention network that enables the simultaneous synthesis of lung perfusion and ventilation images from three-dimensional CT. Preliminary results indicate moderate-to-high structural-wise and functional-wise concordances, and our proposed model demonstrates comparable accuracy when benchmarked against single-decoder models. The synthesized perfusion and ventilation images can potentially be used for precise diagnosis and guiding functional lung avoidance radiotherapy.