Abstract
Speed-of-sound (SoS) heterogeneities introduce pronounced artifacts in full-ring photoacoustic tomography (PAT), degrading imaging accuracy and constraining its practical use. We introduce a transfer learning-based deep neural framework that couples an ImageNet-pretrained ResNet-50 encoder with a tailored deconvolutional decoder to perform end-to-end artifact correction on photoacoustic tomography reconstructions. We propose a two-phase curriculum learning protocol, initial pretraining on simulations with uniform SoS mismatches, followed by fine-tuning on spatially heterogeneous SoS fields, to improve generalization to complex aberrations. Evaluated on numerical models, physical phantom experiments and in vivo experiments, the framework provides substantial gains over conventional back-projection and U-Net baselines in mean squared error, structural similarity index measure, and Pearson correlation coefficient, while achieving an average inference time of 17 ms per frame. These results indicate that the proposed approach can reduce the sensitivity of full-ring PAT to SoS inhomogeneity and improve full-view reconstruction quality.