Abstract
Body part regression is a promising technique for content navigation in computed tomography (CT) images of the human body. This approach involves predicting a position score (scalar value) for each axial slice in a body CT image. These scores establish a one-dimensional universal coordinate system that can localize content-of-interest across different images. Current techniques for body part regression mostly rely on self-supervised learning by assuming that the scores change linearly with the slice indices. One important issue with these techniques is that the predicted scores cannot provide accurate slice localization because they are not calibrated across different images. This work attempts to (1) quantify the theoretically optimal slice localization error using regression scores under linear assumption (regression on ground-truth scores), and (2) improve the slice localization accuracy by incorporating supervised learning using annotations of several key slices in a small dataset. We propose an innovative data augmentation strategy using selective mix-up, and we compare three common ways of adding supervision: (a) supervised training from scratch; (b) fine-tuning the self-supervised model; (C) semi-supervised training. Our results show that the fine-tuned model with the selective mix-up augmentation achieves slice localization error (7.4±6.9 mm) close to the theoretically optimal slice localization error under linear assumption (6.1±6.5 mm), outperforming all other self-supervised and semi-supervised counterparts (>10 mm). Thus, the proposed model is a good candidate of accurate content navigation tool for body CT images.